(S) 



EP0335562 



Publication Title: 

Architecture and organization of a high performance metropolitan area 
telecommunications packet network. 



Abstract: 

182d Abstract of EP0335562 

A high capacity metropolitan area network (MAN) is described. Data traffic from 
users is connected to data concentrators at the edge of the network, and is 
transmitted over fiber optic data links to a hub where the data is switched. The 
hub includes a plurality of data switching modules, each having a control means, 
and each connected to a distributed control space division switcii. 
Advantageously, the data switching modules, whose inputs are connected to the 
concentrators, perform all checking and routing functions, while the 1024x1024 
maximum size space division switch, whose outputs are connected to the 
concentrators, provides a large fan-out distribution network for reaching many 
concentrators from each data switching module. Distributed control of the space 
division switch permits several million connection and disconnection actions to be 
performed each second, while the pipelined and parallel operation within the 
control means permits each of the 256 switching modules to process at least 
50,000 transactions per second. The data switching modules chain groups of 
incoming packets destined for a common outlet of the space division switch so 
that only one connection in that switch is required for transmitting each group of 
chained packets from a data switchihg module to a concentrator. MAN provides 
security features including a port identification supplied by the data 
concentrators, and a check that each packet is from an authorized source user, 
transmitting on a port associated with that user, to an authorized destination user 
that is in the same.group (virtual network) as the source user. This arrangement 
can also be used to switch voice packets, using a voice interface such as a 
digital switch and a digital voice signal to voice packet converter, in accordance 
with one embodiment of the invention, a packet switch is used for switching voice 
packet outputs of the data switching modules and a circuit switch, such as the 
space division switch, is used for switching data packet outputs. In accordance 
with another embodiment, voice packets are switched from the data switching 
modules through the space division switch to a small group of data switching 
modules, which further switch the voice packets through the circuit switch to a 
destination concentrator. Data supplied from the esp@c6net database - 
Worldwide 



Courtesy of http://v3.espacenet.com 



This Patent PDF Generated by Patent Fetcher(TM), a service of Stroke of Color, Inc. 



Europaisches Patentamt 
European Patent OfJtce 
Office eufopeen des brevets 



0 Publication number; 



0 335 562 
A2 



0 EUROPEAN PATENT APPUCATION 

@ Application riun-,ber: 89302779.7 @ int. Cl,^: H04L 11/20 , H04L 11/00 

<g) Date of ftiifig: 21.03.89 



@ Priority; SLOaaa US 175694 
31,03.38 US ITSS^e 
31.03.88 US 175547 
30.08,88 US 238309 

® Date of^publicailon of appiication: 
04.10.89 Bulletin 89/40 

@ Designated Contracting States: 
BE DE ES m Gi lTNLSE 



® Applicant: AiWHRICAN TEI,ePHONE AND 
TELEGRAPH COMPANV 
550 Madison Avenue 
NewYorIt, NY1O0a2(US) 

@ inventor Hemmady, Jayant Qurudatta 
1474 CtJtpepper Drive 
Napsrville ilfinpis 50S40(US} 
Inventor; Lldinsfey, William Paul 
10S223 HldgsRoad 
Napervflie Itiinois e0565(US) 
tnvsntof: Nichols, Robert Ke!!s 
1N712 Forest Avenue 
Qlen Ellyn Itiinois 60137(US5 
inventor: Richards, Gaylord Warner 
7 South 560 Green Acres Drive 
Mapervitle lllinols.60S40(US} 
Inventor: Roediger, Gary Arthur 
5421 IMapievifood Place 
Downers Grove llllnoi:? 60S15(US) 
Inventor: Steele, Scott Blair 
11S072Sherf Street 
Napervlile lliinois 60565(US) 
Inventor; WedcfJse, Ronald Clare 
405S Linden Avenue 
Western Springs lliinois 6055S(US) 
Inventor: Zelle, a'uce Ftonald 
1S31 Foxhil! Road 
Napervlile Illinois 60540(US) 
Inventor: Ulrich, Werner 
434 Maple Street 
Glen Ellyn Illinois S0137(US) 



@ I^eprasentattve: Watts. Christopher 
Keiway, Dr. st a! 

■Western Electric Company limited 5, 
Momlngton f^oad 

Woodford Green Essex, IQ8 OTU(GB) 



^ Architecture and organization of a higti perforraance metropolitan area teiecommuf»ications packet 
network. 



@ A high capacity metropolitan area networi< (MAN) is described. Data traffic from users is connected to data 



EP 0 335 562 A2 



concentrators at the edge of itas network, and is b-ansmitied over fiber optic data llnKs to a hub where. tJie data' is 
swstctieci. The hub indudss a plurality of data switching mcdules, aach having a control means, and each 
connected to a distributed control space division switch. Advantagaously, the data switching modules, whose 
inputs are connected to the concenfrators, perform all checking and routing functions, while tho 1024x1024 
maximum size . space division switch, whose outputs are corsnsoted lo ttie concentrators, provides a large fan-out 
distribution networl<.for reaching many concentrators from each data switching module. Distributed control of the 
Space division switch permits several mi.llion connection and disconnection actions to be performed each 
second, while the pipelined and parallel operation within the control means permits each of the 256 switching 
modules to process at least 50,000 transactions per second. The data switching modules chain groups o.f 
incoming paoi<0ts destined for a common outlet .of the space division switch so that oniy one conriectlon in that^ 
switch is required for trarssmit^ng each group of chained packets from a dats switching modute to a 
concentrator. MAN provides security features including a port Identification supplied by the data concentrators, 
and a checl? that each packet .is from an authorized sotjrce yser, transmitting on , a port associated with that user, 
to an alithori^ed destination user that is in the same group (virtual network) as the source user: 

This arrangement can also be used' to switch voice packets, using a voice interface such as a digital switch 
and a digital voice signal to voice paci<e{ converter. In accordancsi with one embodiment of ttie invention, a 
pacltet switch Is used for switching, voles packet ou^uts of the data switching modules and a circuit switch, such 
as the space division switch, is used for switching data packet outputs, in accordance with another embodiment, 
voice packets, are switched from the data switching modules through the space division switch to a small group 
of data- switching modules, which further switch the voice packets through IJie circuit switch to a destination 
concentrator. 



2 



EP 0 335 562 A2 



ARCHITECTURE AND ORGANIZATION OF A Hl^H PSRFORMANCS MEmOPOLlTAM AREA TELECOMMUMi- 
CATIONS PACKET NETWORK 



Technical Figid 

This invention relates to pacltetized data and voice networks. 

s- 

Probiem 

In data processing systems involving a large amount of distributed computing, featuring large riumbsrs 
JO of computers and including increasing numljers of personal computers, workstations, and data bases, it is 
frequently necessary to sxchange a great deal of data among tliese daSa processing systems. Ttiese 
exchanges require communications networks. $uch networks, refen-ed to as metropolitan area networks, 
when used for interconnecfing data processing systems in an area beyond tfie geographical scope of local 
area networks but less than tfie geographical scope of wide area networks, require data networks capable of 
7$ transmitting data and tefecommunications traffic at a very tiigh bit rate rate witii low latency, 

One type of metropolitan area network Is a network composed of one or more Interconnected data rings 
such as the FDD! (Fiber Distrubuted Data -Interface) network. The basic element of She FDDI: network is a 
data ring capable of transmitting data at BO megatjits/second to user nodes connected to each such ring. 
These rings may be imerconnectsd by providing Inter-ring nodes which allow a transfer of data from one 
20 ring to another. 

Integrated telephone voice and data switching systems are becoming available for offenng customers 
integrated services digital network (ISDN) service. In such systems, data is frequently switched by switchin.g 
data packets using packet switching tschniques. The use of packet switching techniques for a|,so switching 
voice signals; converted into packets "has been suggested, for example, in J, S. Turner; U.S. Patent 
25 4,491,945 (Turner), Such arrangements offer the opportunity to take advantage of the high speed of rrtcdem 
microelectronic clrcultr/, 

A probiem of such data and voice networks is that if there is no predictable commur^ity of Interest 
among the user stations or If there is a high community of interest among stations that are geographically 
far apart, much of the data traffic must be tr^nsmittsd over several rings thus decreasing the data transfer 

so speed and limiting the total data bandwidth of the metropolitan area network. Further, such networks 
encounter a high data latency because each nods on the ring in a metropolitan area network introduces 
desay; in a network tiaving rings with many nodes and having many messages wtiich require transmission 
over several rings, the delay in transmitting a data message from one station to another can, be 
unacceptabiy long, There is no satisfactory large data network havirig low latency for the transmission of 

ss data messages between any pair of terminals connected to the network and having the capability of 
transmitting high priority data messages with especially tow latency. Reliability is another problem 
encountered in such networks. Because all nodes of a ring must work properly for any message to be sent, 
around the entire ring, it is necessary to provide repair access to each node. The provision of repair access 
can add substantial delay at each node thus increasing the latency of data transmitted over the network; in 

40 a typical installation each node is brought to a wiring closet so that the node may be bypassed at a readily 
accessible point A recognized problem in the prior art, therefore, is ttiat there is no data network capable of 
serving a mefropolltan area, having low data latency belwaen any pair of terminals and having a very high 
total data transfer rate, that is also capable of serving voice terminals, stations and data bases with 
unpredictable and varying communities of interest 



Solutten 

Ttie abo^w probisms are solved and an advance is made over the prior art irr accordance with the 
5Q principles of this invention which features a data distribution stage for chaining data packets destined for a 
common outlet of a circuit switch,''and a high-speed, low setup tims circuit switching stage for switching the 
output of the data distribution stage. Advantageously, the circuit switching stage can be quite large, using 
present , technology, and can therefore allow a very high total data throughput by providing at any instant of 
time a large number of separate paths over each of which data can be transmitted at tiigh-speed data 

3 




EP Q 335 562 A2 



trgrtsmission speeds. Advantageously, only daia transmission is performed in the circuit switch thus 
permitting a liigh data througliput rate over eacii separate patii. Advantageously the distributed processing 
performed In itsa- distribution stage allows data nnessage$ destined for a particular output Snk of tlie circuit 
switch to be recognised and chained, 

5 In one embodiment, tlie circuit switch is a space division switch. Advantageousiy, the data transmission 
rate throu&ti each path of such a switch is high, being limited primarily by the characteristics of circuits 
connected to the two sides of the switch. 

in one embodiment of the invention, user ports are connected So a data concentration switch. 
Advantageously, the use of an initiai stage of data concentration allows the characteristics of different types 

10 of users to lie matched to the standard data rate of a data transmission medium such as an optic fiber for 
connecting the data concentration switch to the data distribution switch. Advantageously, the deiays to any 
user in using She network are limited to delay associated with the concentration stage, plus delay for 
buffering messages and setting up a connection in the central space division stage, plus propagation delay. 
Delay is limited to buffering and transmission propagation delay if the concentrator, at tfie time a user is 

T5 transmitting a message, has bandwidth avaiiatile. 

A targe variety of different kinds of users may be attached to the network. These users indude: 
workstations, including, bosh siropie terminals, personal computers, and engineering design workstations; 
computers, Including microcomputers, minicomputerSi mainframes, and supercomputers, including the 
computers of a distributed computing system; data base servers for accessing large data bases; computer 

zo servers for performing special types of operations such as floating point arithmetic or matrix operations; 
gateVfay ports for accessing other networks; voice packet asssmbiers/disassembters for communicating 
telephone signals; and special interconnection facilities for interconnecting two or more metropolitan area 
networks; 

In this embodiment of the invention, the output of each concentration source multiplexer is transmitted 
25 to the distributlpn stage where messages destined for each destination demullpiexer connected to user 
input ports, are buffered in chained blocks of memory. The output:, of the distribution stage, representing 
messages for .,a given destination, demultiplexer. Is then switched by the space division switching stage 
directly to that demuitiplexsf. Advantageousiy, in this^ arrangement, data is tiuffered only in three places; in 
a user ¥yste.m to wait for data transmission resources iri the concentrator; in the distribution stage to 
30 assemble ..data for each destination demultiplexer; and In an interface to a user in order to coliect all data 
messages destined for that user. 

In one embodiment of the invention, data packets from a plurality of user systems are concentrated on 
to a group of high-speed data links connected to ttie data switching hub. if the first packet that is destined 
for a particular output Of the circuit switch is a high priority packet, then the request for a connection to the 

35 dsstination of that paci<et becomes a high priority request and is honored before other requests to the 
circuit switch. Advan.tageously, this arrangement gives a ver^ fast response time to afi packets under normal 
load and gives a fast response to priority packets even under high overload. 

Packsttzed voice signals are switcfiad using a data, switching module that includes a group of banks of 
memory for storing consecutive words of a packet, a group of packet input and a group of packet output 

ao handlers and means for distributing data from each of the input handlers to the memory and from the 
memory to each of the output handlers. 

In this specific embodiment, the basic operating speed of each filjsr optic link is about 150 
megabits/second. Each data distribution switch of the distribution stage has four optic fiber inputs and four 
optic fiber outputs. Up to 2S0 distribution switches can be provided for one metropolitan area network. The 

45 space division switch, therefore, has up to 1 ,000 input fiber Optic links and 1 ,000 output fiber optic iinks. As 
explained above, these output fiijer optic links are connected lo demuStlplexers for accessing the input user 
ports. 

In an alternative embodiment, data packets representing voice signals (voice packets) are switched from 
the data switching modules through a data switch in order to avoid the circuit set up time timitatlons of a 
so circuit switch. High priority data packets, and, optionaliy, any single packet messages, can also be switched 
through the data switch. Advantageously, the relatively short voice packets can be separated from the data 
packets representing data, the latter having less rigorous switching delay requirements and, on average, 
being much longer. 

In another alternative embodiment, grpups of voice packets are swltchsd from a data and voice packet 

36 switch through the space division switch to ones of a group of specialist voice packet switching modules 
which collect and further switch the groups of voice packets through the circuit switch: for connection to ths 
dsstination. Advantageously, in such an arrangemenf, voice packets from a source voice and data packet 
switch destined for a group of destinations., can first be assembled into groups of packets destined for a 



4 



EP 0 33S 5B2 A2 



particuiar specisllst voice packet switch, and vocie packets from many voice and data packet switches can 
them be assembled in eacli voice pacitet switch into groups desaneci for a particular destination. Advanta- 
geously, the number of circuit switch connactiong required per voice pacl<9t switching interval {I.e., the 
interval between successive voice packets to a particular recsiving customer station) Is sharply reduced 

s from tfie number of connections required for switching such voice packets directly from an inittai data 
'switching moduie to an outlet of the circuit switch for transmission to a destination. 

In one embodiment, a tocal switch is part of tfie interface between customer voice signals, in analog or 
digital form, and a packet switching system. "Rie digital output signals from the voice switch are placed on 
trunks which are connected to a packet assembler/disassembler (PAD) for packefizing and tinpacketizing 

70 these signals, Advantags.ousiy, such an arrangement permits the complex voice interfaces and control 
software of a local switch to be used while offering the advantage of a centralized data switching hub for 
distributing the voice traffic widely. Advantageously, irt such an arrangement, data signals from customers 
can tie readily, connected to the data switching hub. 

For some sources, such as digital private branch exchanges {PBXs). a direct connection is made to tlie 

fs PAD. 

In an ait^rnative embodiment, messages for each destinatiors distribution unit, are coliected within each 
source distribution unit. Tliese messages are then sent from the source distribution unit to the destination 
distribution unit through the space division switch. Each destination distribution unit then distributes 
received messages to the destination demufiplexer or, for a high-speed destination user, directly to the 
so destination user. 



Brief Dggcrlptjon gf the Drawing 

ss FIG. 1 is a graphic represenation of the characteristics of the type of communications traffic in a 

metropolitan me& network. 

RQ, 2 is a high leva! block diagram of an exemplary metropolitan area network (referred to herein as 
MAN) including typical input user stations that communicate via such a network. 

FIG. 3 is a more detailed block diagram of ttie hub of MAN and the units communicating with that 

3? hub- 

FIQS. 4 and 5 are block diagrams of MAN illustrating how data flows from input user systems to the 
hub of MAN and back to output user systems. 

FIG. 6 is a simplified illustrative: example of a type of network which can bs used as a circuit switch 
in the hub of IvSAN. 

35 FIG. 7 is a block diagram of an illustrative ^bodiment of a MAN circuit switch and its associated 

control network. 

FIGS. 8 and 9 m flowcharts representing the flow of requests fronrt the data distribution stage of the 
hub to the controllers of the circuit switch of ttie hub. 

FiG. 10 is a block diagram of on.e date dtstribtilion switch of a hub. 
40 FIGS. 11 '1.4 are block diagrams and data layouts of portions of the data distribution switch of the 

hub. 

FIG. 15 is a block diagram of an cperaiion, administration, and maintenance (OA&M) system for 
controlling the data distribution stage of the hub. 

FIG. 16 Is a biook diagram of an interface module for interfacing between end user systems and the 

4S hub. 

FiG. 17 is a block diagram of an arrangement for interfacing between an end user system -add a 
network interfacei 

FIG. 18 is a block diagram of a typical end user system. 

PIG, 18 is a block diagram of a control arrangement for Interfacing between an end user system end 
50 the hub Of Mm. 

FIG. 20 Is a layout of a data packet, arranged for transmission through MAN illustrating the MAN 
protocol. 

FIG. 21 illustrates an alternate arrangement for controlling access Irom the data distribution switches 
to the circuit switch control. 

S5 FIG. ;22 iS: a block diagram, illustfatlng arrangements for using MAN to switch voice as well as data. 

FIG. 23 iilustrates an arrangement for synchronising data received from, the circuit switch by one of 
the data distribution switches. 



5 



# 



EP 0 335 562 A2 



FtQS. 24 and 26 illustrate an alternate arrangement for !tie hub for switching packetized voice and 

data. 

FlCa, 25 is a block diagram of a MAN circuil switch controller. 



General Description 

The Detailed Dsscrlption of this specification is a cJescriptron of an exemplary metropolitan area network 
(referred, to herein as fvlAN) that mcorporates the present invention. Such- a nshvorfc as shown in FIQS. 2 
fo and 3 irapiudes an outer ring of network interface modules (NIMs) 2 connected by fiber optic links 3 to 3 hub 
1. The hub interconnects data and voice packets from any of the NiMs to any other NIM. Ths NlMs, in turn, 
are connected via interface modules to user devices connected to' the nstwork. 

The invention embodied In the Detailed Description rgiates to the hub of the network. While the entire 
Detailed Descripfion supports the Invenlion as ciaimed, that portiort which deals with FIGS, 3-5, and lO-IS is 
15 especlaliy pertinent to the architecture of the hub. 



Detaiied Descrfption 



t INTRODUCTiON 

Data networks often are classified by their size and scops of ownership. Local area networks (LANs) are 
usuaiSy owned by a single organisation and have a reach of a few kifometers. They interconnect tens So 

2S hundreds of terminals, computers, and other end user systems (EUSs), At. the other extreme are wide area 
networks (WANs) spanning continents, owned by common carriers, and Interconnecting tens of thousands 
of ?U5s. Between these extremes .other data networks have been Identified whose cope ranges from a 
campus to a metropoKan area. The high performance metropolitan area nstworlc to be described herein will 
be referred to as MAN. A'laWe of acronyms and abbreyMlpns is found In Appendix A. 

33 Metropolitan ares networks serve a variety of EUSs ranging from simple reporting devices and iow 
Intelligence terminals through personal computers to large mainframes and supercomputers. The derriands 
that these EUSs place on a network vary widely. Some may issue messages irsfreqijently while others may 
issue many messages each second. Some messages may be only a few byteis while others may be files of 
millions of bytes. Some EUSs may require delivery any time within the next few hours while others may 

35 require delivery within micrpseconds. 

This invention of a metropolitan are network is a computer and telephone communications network that 
has been designed for transmitting brbadbaed iow latency data which retains and indeed exceeds the 
performance characteristics of the highest performance local area networks. A metropolitan are network has 
si^e characteristics similar fo those of a class 5 or end-office telephone central office: consequently, with 

40 respect to size, a metropolitan area network cm be thought of as an end-office for data. The exemplary 
embodiment of the invention, hereinafter called I^AN, was designed with this in mind. However, MAN aiso 
fits well either as an adjunct to or as part of a switch module for an end-office, thus supporting broadband 
Integrated Services Digital Network (ISDN) sen/ices. MAN can aiso be effective as either a iocsl area or 
campus area network. It is able to grow gracefully from a sma!( LAN through campus sized networks to a 

•is full MAN. 

The rapid proliferation of workstations and their servers, and the growth of distributed computing are 
major factors that motivated the design of tills invention." MAN was designed to provide networking for tens 
of thousands of diskless workstations and servers .and other computers over t&ris of kitometers, where each 
user has tens to hundreds of simultanoous and different associations with other computers on the network, 

50 Each networked computer can concurrently generate tens to hundreds of messages per second, and 
re^^uire HQ rates of tens to hundreds of miiljons of bits/second (Mbps). Message sizes may range from 
hundreds of bits to miliions of bits. With this level of performance, MAN is capable of supporting remote 
procedure calls, interobject communicatfons, remote demand paging, remote swapping, tile transfer, and 
computer graphics. The goal is to move most messages (or transactions as they will be refen'sd to 

55 henceforth) from an EUS memory to another EUS memory within less than a millisecond for smalt 
transactions and within a few milliseconds for iarge transactions, FiQ,, 1 classifies transaction types and 
shows desired EUS response times, as. a function of both transaction type and size, simple (i.e., low 
intelligence) terminals 70, remote procedure calls (RPCs) and interobject communications (idCs) 72, 

S 



EP 0 3S5 5G2 A2 



demand paging 74, memory swapping 76, animated computer grapiiics 78. computer graphics stiil ptctyrss 
80, file transfers 82, and pficketized voice 84, iVteeting the response time/transaction speeds of F!G. 1 
represents part of the goats of tlie MAN networl?. As a calibration, lines of constant bit rate are shorn where 
tiie bit rate is lil<eiy dominate t!ie response time. MAN has an aggregate bit rats of 150 gigabits per 
s second and can handls 20 million network transactions psf second with !he exemplary choice of the 
processor eiements slxiwn In FIQ. 14, Furlhermore, it has beer* designed to handle traffic overloads 
gracefully. 

MAN Is a network which performs switching and routing as many systems do. but also addresses a 
myriad of cither necessary functions such as ^rror handling, user interfacing, and the lil<e. Significant privacy 
TO and security features In MAN are provicfed by ari . ajthentlcation capability. This capability prevents 
unauthorised network uss, enables usage-sensitive billing, and provides nbn-forgeabte source identificstion 
for all information. Capability also exists for defining virtuai. private networks. 

MAN is a (ransacilon-oriented (i.e., connectioniess) network. It does not need so incur the overhead of 
establishing or. maintaining connections although a connection veneer can fee added in a straightfor\vard 
IS fashion, if desired. 

MAN can also be used for switching packettzed voice. Because of the short delay In traversing the 
network, the priority which may be given to the transmission of single packet entitles, and the low variation 
of delay , when the network is not heavily loaded, voice or a mixture of voice and data can be readiiy 
supported by MAN, For clarity,- the term data as used hereinafter includes digital data representing voice 
20 signals, as well as digital data representing commands, numerical data, graphics, programs, data files and 
other contents of memory. 

MAN, though not yet completely ijuilt, has been extensively simulated, tvlany of the capacity estimates 
presented hereinafter are based on these simulations. 



2 ARCHITECTURE AND OPERATION 



2.1 Architecture 

The- MAN network is a hierarchical star a-chilecture with two or three levels depending upon how 
closely one looks, at the topology. FlG. 2 shows She network as consisting of a switching center called a hub 
1 linked to network Interface modules 2 (NIMs) at the edge of the network. 

The hub is a very high performance transaction store-and-forward system thai gracslully grows frpm a 

35 small four link system to something very large that is capable of handling over 20 million network 
transactions per second and that , has an aggregate bit rate of 150 gigabits per second. 

Radiating out from the hub for distances of up to tens of kiiomgtsrs are optical fibers (or alternative data 
channels) called external links (XLs> (connect NIM to W[\HT), each capaWs of handling full duplex bit rates 
on the order of 150 megabits per second. An XL terminates In a NIM. 

40 A NIM, the outer edge of which delineates the edge of the network, acts as a concentrator/dumultiplexer 
and also identifies network ports, it concentrates when moving Information into the network and demul- 
tiplexes when moving information out of ift? network. Its purpose in concentrating/demultiplexing is to 
tmerfacs multiple end user systenris 26 (EUSs) to the network in such a way as to use She link efficiently 
and cost effectively, Up to 20 EUSs 26 can be supported by each NIM depending upon the euSs 

4S networking needs. Examples of such EUSs are the increasingly common advanced function workstations 4 
whsro the burst rates are already in the' 10 Mbps .range (wifh the expect^on that much faster systems will 
soon be avail^le) with average rates orde.r5 of magnitude lower, if the EUS needs an. average rats that is 
cioser to its burst rate and the average rates are of the same order of magnitude as Itiat of a NIM, th$n a 
NIM can either provide rnuttipfe interfaces to a single EUS 26 or can provide a single interface with the 

so entire NIM and .XL dedicated to that EUS. Examples of HUSs of this type include large mainframes 5 and 
file servers 6 for ths above workstations, iopat area networks such as ETHERNET® 8 and high performance 
iocal area networks 7 such as Proteon® 80, an 80 MBit token ring manufactured by Proteon Corp., or a 
system using a fiber distributed data interface (FDpl), an evolving Americat National Standards Instituie 
(ANSI) .standard protocol ring interface, in the latter two cases, the LAN itself may do the concentration and 

ES the NIM then degenerates to a single port network interface module. Lower performance bcai area networks 
such as ETHEBNgT 8 and IBM token rings may not need ail of the capablSity that an entire NIM provides, 
tn these cases, the LAN, even though it concentrates, may connect fo a port 8 on a muitiport NIM. 

Within each EUS there Is a' user interface module (U!M) 13. This unit serves as a high bit rate direct 



7 



EP 0 335 562 A2 



memory access port for the EUS and as a buffer for transEctions received from ttie network. It also ofWoads 
the EVS .from MAN interface protocol concerns. Closely associated witti the UIM is the MAN EUS-rosidant 
driver. It wori<s with the i)\M to format outgoing transactions, receive Incomirtg transactions imptement 
protocols, and interface witli tiie EUSs operatit^g system. 

5 A closer Inspection {see FIG. 3) of tJie hub reveals two different functionat units - a MAN switch (MANS) 
10 and one or more memory irsterfacs modules 1 1 {MINTs). Eacii I^INT is connected to up to four NiMs via 
XLs 3 and thus can accommodate ' up to 80 EUSs. Ttis choice of four NIMs psf' MINT is based upon a 
number of factors including transaction handling capacity, buffer memory size within the MINT, growabiiity 
cf the.networi<, failure group size, and aggregate bit rate, 

JO Each MINT is connected to Ihe MANS by four internal lini<s 12 (!Ls) (connect MINT and MAN switch),, 
one of which is shown for each of tlie MINTs in FIG. 3. The reason for four links In this case is different 
than it is for the XLs- Here multiple links are necessary because ttie MINT will normally be sending 
information .through the MANS to multiple destinations concurrently; a single 11 would present a bottleneck. 
The choice of 4 iLs (as welt as many other design choices of a similar nature) was, made on the basis of 

fE e>ctensive analytical and simulation modeling. The ILs run at the same bit rate as the external links but are 
very short since the entire hub is co-located. 

The smallest hub consists of one MINT with the IL5. looped back and no switch. A network based upon, 
'^its hub includes up to four NIMs and accommodate up to 80 EUSs. The Sargest hub that is currently 
envisioned consist of 258 MINTS and a 1024 x 1024 MANS. This hub accommodates 1024 NIMs and up to 

20 20,000 EUSs, By adding MINTs and growing the MANS, the hub and uitlmately the entire network grows 
very gracefully. 

2.1.1 LtJWUs. Packets, SUWUs. gn d Transactions 

S5 

before going further several terms need to be discussed. ELIS transactions are transfer of units of EUS 
informatfon that are meaningful to the EUS.. Such transactions might be a remote procedure call consisting 
of a few bytes or the transfer of a 10 megabyte database. MAN recognizes two EUS transaction unit sizes 
that are called long user work unit (LUWUs) and short user work units (SllWUs) for the purposes of this 

30 description. While the delimiting size is easily engineerabia, usually transaction units of a couple of 
thousand bits or less are considered SUWUs whila larger transaction units are LUWUs. Packets are given 
priority within the network to reduce response time based upon criteria shown in FIG. 1 whei:e it can be 
seen that the smaller EUS transaction units usually need faster EUS transaction response times. Packets 
are kept intact as a single frame or packet as they movo througb tha network. LUWUs are fragmented into 

35 frames or packets, called packets hereinafter, by the transfn.ittlng UIM, Packets and SUWUs are sometimes 
coilectiveiy referred to as network transaction units. 

Transfers through the MAN switch are referred to as switch transactions and the units transferred 
through the MANS, are switch transaction units. They are oomposedi of one or more network transaction 
units destined for the same NIM. 



2.2 Functional Unit Overview 

Prior tb discussing the, operation of MAN, it is useful to provide a brief overview of each major functional 
45 unit within the network. The units described are the UIM 13, NIM 2, MINT 11. MANS 10. end user system 
link (connects NIM and UIM) (EUSL) 14, XL 3, and iL 12 respectively. These units are depicted in RG. 4, 



2,2.1 User interface Module - Uif^ ^ 13 

This module Is located within the EUS and often plugs onto an EUS backplane such as a VM.E® bus 
(an IEEE standard bus), an. Intel MULTIBUS II®, mainframe I/O channel. It Is designed 10 fit on one printed 
circuit board for most applications. The UIM 13 connects to the NIM 2 over a duplex optical fiber link called 
the EUS link 14 (EUSL), driven by optical transmitter 97 and 85. This link runs at the same speed as the 
ss external link (XL) 3, The UIM has a memory queue IS used to store. information on its way to the network. 
Packets and SUWUs are stored iand forwarded to the NIM using out-of-band flow conb-ol. 

By way of contrast, a receive buffer memory 80 must exist to receive informatics from the network, in 
this case entire EUS transactions may sometimes be stored until they can be transferred Into End User 

8 



1 



& 0 33S 562 A2 



System memory, The receive buffer must be capabSe of dynamic buffer chaining. Partial EUS transactions 
may arrive concurrently in an interleaved fasiilon. 

Opticat Receiver 87 receives signals from opticaJ link 14 for. storage in receive buffer memory SO, 
Control IS controls U!M 13, and controls excharige of data between transn^it firEt-in-first-out (FIFO) cjuaue 
s 15 or rsceive btiffar msrrjory flO and s bus inteifaca for interfacing with bus 92 vvhich connects to end user 
system 26. The details of the control of UIM 13 are mm in FI.Q, 19. 



2.2.2 Network Interface Moduts ■ NIM 2 

iO 

A NIM 2 Is the part of MAN that is at ttie edge of Sine network. A NJM f^rforms six functions: (1] 
concentraSon/demuitlplexing including queuing of packets and SUWUs moving toward Sfie iVIiNT and 
external link arbifration, (2) participation in network security using port identification, {3) participation !n 
congestion control, (4) EUS-to-network control message identification, (5) participation in error handling, and 

rs {6) netvifork Interfacing. Small queues 94 in memory similar to those 1S. found in the UIM exist for each End 
User System, They receive information from the U!M via link 1 4 and receiver 68 and store it umit XL 3 is. 
avaiiabie for transmission to the \^\HT. The outputs of Shese queues drive a data concentrator 95 which In 
turn drives an optical transmitter 86, An external link demand multiplexer e)a5is which services demands for 
the use of the XL. The NIM- prefixes a port identificatioii nurnber 600 {F!G. 20) to .each network transaction 

ao unit flowing toward the MINT. This Is used in various, ways to provide value added services' such as rsiiable 
and non-fraudulent sender identification and ijilling, This prefix is particularly desirable for snsuring that 
members of a virtual network are protected from unauthorized access by outsiders, A check sequence Is 
processed for error control. The NIM, working with; ths hub 1, determines congestion status within the 
network and controis flow from the UlMs under high congestion conditions. The NIM aiso provides a 

ss standard physicaJ and logicai interface to the networl< including flow control mechanisms. 

Information flowing from the netwoi1< to the EUS is passed through the NIM via receiver 69, distributed 
to the correct UIM by data distributor 86, and sent to destination UIM 13 by transmitter 85: via link 14. No 
buffering is done at the NiM. 

There are only two types of NiMs. One type {such as shown In FtG.- 4 and Ih© upper right of FiQ. 3) 

30 concentrates while the other type {shovm at the lower right of Fi0., 3) does not. 



2.2.3 Memory and interface Module - MINT 12 

35 MINTS are located in the hub. Each MINT 1 1 consists ot (a), up to four external link handlers 15 (XLHs) 
that terminate Xls and aiso receive signals from the half of the internaS link that moves data from the switch 
10 to the MINT; (b) four internal link handlers 17 (ILHs) that generate data for the half of the IL that moves 
data from a MINT to the switch; (c) a .memory 1 8 for storing data while awaiting a path from the MINT 
through the switch to the destination NIM; (d) a Data Transport Ring 1 9 that moves data between the link 

40 handlers and the memory and also carries MINT control information; and (e> a control unit 20. 

All functional units within the MINT are designed to accommodate the peak aggregate bit .rate for data 
moving concurrently into and out of the MINT. Thus the ring, which is synchronous, has a set of reserved 
slots for triovifig information from each XLH to memory and another set of reserved slots for moving 
information, from memory So each ILH. It has a read plus write bit rate-of over 1.5 Gbps. The memory is 512 

45 bits wide so that an adequate memory bit rate can be achieved with components having reasonable access 
times. The size of the memory (1 B Mbytes) can be kept small because the occupancy time of information in 
the memory Is also smail (^bpyt 0.S7 milliseconds under full network load). However, this is an engineerabte 
number that can be adjusted if necessary. 

The XLHs are bi-directionat but not symmetric. Information moving from NIM to MINT is stored in MtNT 

so memory. Header information is copied by the XLH and sent to the MINT control for processing. In contrast, 
information moving from the switch 10 toward a NIM Is not stored: in the MINT but simply passes through 
the MINT, without being processed, on its way from MANS 10 output to a destination NIM 2. Due to 
variable path lengths in the switch, the information leaving the MANS 10 is out of ph.ase with respect to ths 
XL. A phase aiignmsnt and scrambier circuit {described in section 6.1) must align the data before 

S5 transmission to the NiM can occur, Sectiofs 4,6 describes the internal link hgndler (!LH). 

The MINT performs a variety of functions including (1) some of the overal!- routing within the network, 
(2) participation in user validation, (3) participation in network security, (4} queue management. (5) buffering 
of network transactions, (6) address translation, (7) participation in congestion control, and (8) the generation 

9 




EP 0 33S 562 A2 



of operation, administration, and maintenance (OASM) primitives. 

The control for the MINT is a data flow processing system tailored to tlie MINT contro! atgoriShms, Eacfi 
mm is. capable of processing up to 80,000 network transactions per second, A fuily provisioned hub vilih 
250 MINTs can therefor© process 20' million networit transactions per second. Ttiis Is discussed furttier In 
s section 2.3. 



2.2A MM Switcii ' MANS 10 

70 The N4ANS consists of tv/o main parts (a) th© fatjric 21 througl^ whicii information passes and (b) Sfie 
ccntrol 22 for tiiat fabric. The control allows tlie switch to bs set up in about SO microseconds. Special 
properties of ths fabric allow the conti-ol to be decomposed into conflpletety independent sub-controilers ttiat 
can operate in parallel. Additionally, each sub-controlier can be pipelined. Thus, not: only is ths setup time 
very fast but many paths can be set up concurrently and She "s&up throughput" can be made high enough 

IS to accommodate high retjuest rates from large numbers of MINTs. WANs can be made in various sizes 
ranging from 18x16 (handiing four MiNTs) to 1024 X 1024 (handling 256 MINTs). 



The and user system link 14 connects the NIM 2 to ths DIM 13 that resides within: ths end user's 
equipment. It is a full duplex optical fiber link that runs, at the same rate and in synchronism with the 
external link on the other side of the Nll^. It is dedicated to the SUS to which it is connected. The length of 
the EUSL is intentled to bs on the order of meters to 10s of meters. However, there is no reason why it 
as couldn't be longer if economies allow It. 

Ttie t>asic format and data rate for the EUSL for the present embodiment of the invention was chosen to 
be the same as that of the Metrobus Ugfitwave System OS-1 link. Whatever link layer data transmission 
standard is eventually adopted would be used in iater embodiments ot MAN. 

30 

a.2,6 gxlBrn al Unks-XL^ 

Tlie external link (XL) 3 connects the NIM to the MINT. It is also ^ fOil duplex synchronous optical fiber 
link. It Is used in a demand multiplexed fashion by the end user systems connected to its Nlfvl. The length 
3S of the XL is intended to be on the order of 10s of kilometers. Demand multlpiexing is used for economic 
reasons. It employs the Metrobus OS'1 format and data rate. 



2.2.7 Intemai Links ■ IL 24 

JO 

The interna! link: 24 pro\^des connectivity betvyeets a" MII^T and the MAN switch. It Is a unidirectional 
semi-synchronous link that retains frequency but bses the synchronous phase relationship as it passes 
through the MANS 10, The length of the IL 24 is on the order of meters but coutd be much longer if 
economics allowed. The bit rate of the IL is the same as that of QS-1, The- format, however, has only limited 
js simiiarity to OS-t because of the need to resynchroni?? the data. 



2.3 Software Overview 

50 Using a workstation/server paradigm, each end user system connected: to MAN is able to generate over 
50 EUS transactions per se&and consisting of LUWUs and SUWUs. This translates into about 400 network 
transactions per second (packets and SUWs). With up to 20 EUS per Nli\i1, each N!M must be capable of 
handling up to 8000 network transactions per second with each MiNT handling up to four times this amount 
or 32000 network transactions per second. These are average or sustained rates. Burst condition? may 

5S substantially increase "instantaneous" rates for a single £11$ 26. Avsragirig over a number of EUSs will, 
however, smooth out individual EUS bursts. Thus while each NiM port must deal wift bursts of considerably 
more *ian 50 network transactions per second, NIMs {2) and XLs (3) are likely to see only moderate bursts. 
This is sven more true of MINTs 11, each of which serves 4 NIMs. The MAN switch iO must pass an 



10 



EP 0 33S 552 A2 



average of 8 million networl< transactions psi- second, but the switch cofitralier does not need to process 
this many switch requests since the design of the MfNT control allows multiple packets and SUWUs goirsg 
to the same, destination NIM to be switched with a single switch setup. 

A second factor to- be considGfed is network iransacljon interarrivat lime. With rates of iSOMbps and the 
3 smallest network transaction being an SUWU of 1000 biSs, t^vo SUWUs could arrive at a NIM or MINT 8.67 
microseconds apart. NIMs and MINTS must be able to fiandl© several back-So-back: SUWUs on s transient 
basis. 

Hie control software in the NlMs and especlaily ths MINTs must deal with this severe real-time 
transaction processing. The asymmetry and bursty nature of data traffic requires a design capable of 

10 processing peak loads for short period of time. Thus the transaction control software structure must be 
capable of exectJting many hundreds of millions of CPU instructions per second (100's of MIPs), Moreover, 
In MAN, this control software performs a mtiitlplicity of functions inciuding routing of packets and SUWUs, 
network' port identification, queuing of network transactions destined for the same NIM over up to tOOO 
NIMs (tliis means real tirtje' matntenancfe of up to 1000 queues), handling of MANS requests and 

fS acknowledgements, flow control of source EUSs based on complex criteria, network traffic data collection, 
congestion controi, and a myriad of other tasks. 

Thff MAN control software is capable of performing ail of the above tasks in real fime. The control 
software is executed in three major compofientsi NiM control 23, MINT control £0, and MANS control 22. 
Associated wi&i these three controi components i? a fourth control structure 's within the UIM 13 of the Snd 

go User System 26. FIG. S shows this arrangement. Each NIM and MINT has its own contro! unit. The control 
units function independently but cooperate closely, This partltiorijng of control is one of the architectural 
mechanisms' that makes possible WAN's real-time transaction processing capability. Tine other mechanism 
that allows MAN lo handle high transaction rates is She technique of decomposing the control into a logical 
array of subfunctions and independently applying processing power to each sublunction. This approach has 

?.s been greatiy facilitated by the use of Transputer® very large scale integration (VLSI) processor devices 
made by INti^OS Gorp, The technique basically is as follows: 
Decompose the problem into a number of subfunctions, 

- Arrange the subfunctions to form a dataflow structure. 
Implement each subfunction as one or more processes, 

30 - Bind sets of processes to processors, arranging the tjound processors in the same topology as the 
dataflow structure so as to form a dataliow system that wilt execute the fu.ictlon, 

- iterate as necessary to achieve the reai-timo performance required. 

Brief descriptions of the functions performed by the NIM, MINT, and MANS (most of which are done by 
the software control for those modules) are given in sections 2,2.2 through 2.2,4. Additional information is 
35 given in section 2.4, Detailed descriptions are included iate^ In this description within specific sections 
covering these subsystems. 



2.3.2 Control Processors 



The processors chosen for the system implementation are Transputers from INMOS Corp. Ttiese 10 
million instmctions/second [UP) reduced instruction set control (RISC) machines are designed to be 
connected in an arbitrary topology over 20 Mbps serial links. Each machine has four links with an input and 
output path capable of simultaneous direct memory access (DMA). 



2.3.2 MINT Cgnfrol Perfoffrsance 

Because of the need to process a iarge number of transactions per second, the processing of each 
so transaction is broken Into serial sections which form a pipeline. Transactions are fed Into this pipeline .where 
they are processed simultaneously with other transactions st more advanced stages within the pipe. In 
addition, ttiere are multiple parailet pipelines each handling unique processing streams simuitansously. 
Thus, the required high transaction processing rats, where each transaction requires routing and other 
complex 58fv!cing, is achieved by breaking the control structure into such a paratlel/pipetined fabric of 
ss interconnsctsd processors. 

A constr^nt on MINT control is that any seria! processing can take no ionger than 
J / {nurnber of transactions per second processed in this pipeilne), 

A further constraint concerns the burst bandwidth for headers entering the contro! within an XLH 1S. If the 



11 




ep 0 335 562 A2 



time between successive network units arriving ai tlie XLH Is iass than 
(fieader size) / {bandwidth into control) 

Ihen the XLH must buffer iieaders, Ttie maximum number of transactions per second assuming uniform 
arrival k given by; 
5 (bandwidtfi into control) / ( size of transaction header). 

An exampie based upon the effective bit rate of transputer links and the 40 byte MAN networl< transaction 
tieader is;; 

fS.OMb/s for control link)/(3S0 bit header/transaction) =25,000 transactions/sec. per XLH, 

or one transaction per XIH every 40 microseconds. Because transaction interafrivai times can be less tlian 

10 this, header buffering is performed in tiie XLH, 

The MINT must be capable, within this time, of routing, executing biiling primittvss. making switch 
requests, perfonntng network control, memory management, operation, administration, and maintenance 
activities, name serving, and aiso providing other network services such as yellow page primitives, The 
parallelfpipeiined nature of MINT control 20 achieves these goals.. 

75 As an example, the allocating and freeing of htgh'-speed memory lilocks Can be processed completeiy 
independently of ftiuting or biiling primi^ves. Transaction ftow within a MINT is controlled in a single pipe by 
the management of the memory block address used for storing a network transaction unit (ie. packet or 
SUWU). At the first stags of the pipe, memory management allocates free blocks of high-speed MINT 
memory". Then, at the next stage, these blocks are paired with the headers and routing translation is done. 

30 Then switch units are collected based on memory blocks sent to common NIMs, and !o close the loop the 
memory blocks are freed; after the blocks' data is ffansmilted Into the MANS. Billing primitives are 
simultaneously handled within a different pipe. 



as 2.4 MAN Operation 

The EUS 25 is viewed by the network as a user with capabilities granted by a network administration. 
This is analogous to a terminal user logged into a ame-sharing system. The user, such as a workstation or a 
front end processor acting as a concentrator fw stations or even networks, will be required to make a 

30 physical connection at a NIM port and then identify itself via its MAN name, virtual network identification, 
and password security. The network adjusts routing tables to map data destined for this name to a unique 
NlM port- The capabilities of this user are associated with the physical port. The exampie just given 
accommodates the paradigm of a portable workstation. Ports may also be configured to have fixed 
capabilities and possibly be "owned" by one MAN named end user. This gives users dedicated network 

35 ports or provides privileged administrative maintenance ports. The source EUS refer to the destination by 
IVIAN names or services, so they are not required to know anything about the dynamic network topology. 

The high bit rate and (arge transaction processing capability internal to She network yield very short 
response time^ and provide the EUS with a means to move data In a metropolitan area without undue 
network considerations. A MAN end. user wiil see euS-memory-to-EUS memory response times as low as a 

JO millisecond, low error rates, and She ability to send a hundred EUS transactiorts per second on a sustained 
basts. This number can expand to severa! thousand for high performance EUSs. The EUS will send data in 
whatever size is appropriate to his needs with no maximum upper bound. Most of the Simitations on 
optimizing MAN performance are Imposed i3y the limits of the EUS and applications, not the overhead of 
the nelwori!. The user wiil supply the following information on transmitting data to the UIM: 

45 . A MAN name and virtual network name for the destination address thai is independent of the physical 
address. 

• The size of She data, 

- A MAN type field denoting network service required. 

- The data. 

so Network transactions {packets and SUWUs) move along {he follQWing logical path (see FIQ. 5): 

sourceUlM 3 «— sourceNIM = =~* MINT ~ ~^ MANS==-* destination NltVl{via MINT) ==~ de- 
stinationUIM. 

Each eus transaction (i.e.. LUWU or SUWU) is submitted to its UiM. inside the UIM, a LUWU is further 
fragmented into vartabta size packets. An SUWU is not fragmented but Is logically viewed in its entirety as 
55 a rsetwork transaction. However, the determination that a network transaction is an SUWU is not made until 
the SUWU reaches the MINT where the inform^lon is used in dynamically categorizing dstta into SUWUs 
and packets for optimal network handling. The NlM checks incoming packets from the EUS to verify that 
they do not: violate a maximum packet size. The UM may pick packet sizes smaller than th& maximum 



12 



EP 0 335 562 A2 ^ 

cJepending on EUS stated service. For optimum MINT memory utilization, the packet size is iho standard 
maxifflum. However under some circumstancas. the application may request tfiat a smaller packet size be 
used because of end user considsratlon such as tlinlng problems or data availability tiraitig. Additionally, 
there may be timing limits where the UIM will send wiiat it currently has from !t^e EUS, Even where the 

5 maximum size packet is used, the last packet of a LUWU usually Is smaller than the maximum size packet. 
At the transmitting UIM each network transaction (packet or SUWU) is prefixed with a fixed length MAN 
network header. It is the Information, within this header which the MAN network software uses to route, bill, 
offer network services, and provide network comrol. The destinatfon :l|!i)t :al$D uses the information within 
this header in its job of delivering EUS transactions to the end user, The network transacSons are stored In 

io the U!M source transaction queue from which they are transmitted to ihe source N!M. 

Upon receiving network transactions from UIMs, the HM receives them in queues permanently 
dedicated to the EUSLs on which the transaction arrived, for forwarding to the MINT 11 as soon as the link 
3 becomes evailabls. Tine control software within the NiM processes the UlM to NIM protocol to identify 
control messages and prepends a source port number to the transaction that will be used by the f*r!lNT tp 

75 authenticate the transaction. End-user data will never b© touched by MAN network software unless the data 
is addressed to the network as control information provided by the end user. As the transactions are 
processed, the source I^IM concentrates thsm onto the external link between the source NIM and its MINT. 
The source NiM tc MINT links terminate at a hardware interface In the MINT (the external link handler or 

so '^'^The" external link protocol between the NIM and MINT allows the XLH 16 to detect the beginning and 
end of network iransaclions. The transactions are immediafeiy moved info a memory 18 designed to handle 
She I50fvlb/s- bursts of data arriving at the XLH. This memory access is via a high-speed Hme slotted ring 19 
which guarantees each l50Mb/s XLH input and each ISOMb/s output from the MINT (ie. MANS inputs) ^ 
bandwidth with no contention. For example, a MINT which concentrates 4 remote NIMs and has 4 input 
ports to the center switch most have a burst access bandwidth of at least 1.2Gb/s. Th© memory storage is 
used to fixed length blocks of a size equal to the maximum packet size plus the fixed length MAN header,, 
The XLH moves an. address of a fixed size memory block followed by the packet or SUWU data to the 
memory access ring. The data and network header are stored until the MINT control 20 causes its 
transmission into the MANS. The MINT control 20 will continiiaily supply the XLHs with free memory block 

30 addresses for storing the incoming packets and SUWUs. The XLH also "knows" the length of the fixed size^ 
network header. With this Infonnation the XLH passes a copy of the network header So MINT control 20. 
iyllNT control 20 pairs the header with the block address it had given the XLH for storing the packet or 
SUWU. Since the header is the only Internal representation of the data within MINT control it is vital that it 
be con'ect To ensure sanity due to potential link errors the header has a cyclic redundancy check (CRC) of 

3S its' own. The path this tuple takes within MINT control must ba the same for ai! packets of any given LUWU 
[this allows ordering of LUWU data to be preserved). Packet and SUWU headers paired with the MiNT 
memory block address will move through a pipeline of processors. The pipeline allows multiple CPUs to 
process different network transactions at various stages of MINT processing. In addition, there are multiple 
pipelines to provide concurrent processing, 

40 MINT control 20 selects an unused internal link 24 and requests a path setup from the IL io the 
destination NiM (through the MINT attached to that NIM). MAN switch control 21 queues the request and 
when, the path is available and (2) the XL 3 to the destination NIM Is also avail^le, it notifies the source 
MINT while concurrently setting up the path. This, on average and under full load, takes 50 microseconds. 
Upon notification, the source MINT transmits all network transactions destined for that NIM, thus taking 

45 maximum advantage of the path setup. The internal link handler 17 requests network transactions from the 
MINT memory and transmits them over the path: 

a sourcelL = = - MANS = - -* destination IL = = - XLH, 
this XLH being attached to the destination NIM. The XLH recovers bit synchronization on the way to the 
destination NIM,. Note that information, as it leaves the switch, simply passes through a MINT on its way to 

so the destinaOoti NIM. The MINT doesn't process it in any way other Sian to recover bit synchronization that 
has been lost In going through the MANS. 

As information (i.e„ switch transactions made up of one or mora network fransactions) arrives at the 
destination HM it is demuitipiexed into network transactions (packets and SUWUs) and forwarded to the 
destination UiMs. This done "on the fly"; there is no buffering in the NIM on She way out of tfie network, 

55 The receiving USM 13 will store ttie network transactions in its receive buffer memory &0 and recreate 
EUS transactions (LUWUs emd SUWUs). A LUWU may arrive at the UiM in packet sized pieces. As soon as 
at least part of a LUWU arrives, the USM will notify the EUS of its existence and will, upon Instructions From 
the EUS, transmit under the control of its DMA, partial EUS or whole EUS transactions into the EUS 



13 




BP 0 335 562 A2 



memory In DMA trsnsfer sizes spscified. by the EUS. Aitorwte' paradigms exist for transfer frorn UlM to 
EUS. For insfanc©. an EUS can tell the U!M ahead of time that whenever anything arrives ttie UtM should 
transfer it to a specified buffer in EUS memory. The UtM would then not need to announce the arrival of 
information but would Immediately transfer It to the EUS, 



2.S AdditSonai, Considerations 



10 2.5.1^ gror Handling 

In order to achieve latencies in the order of hundreds of microseconds from EUS memory to EUS 
memory, errors must be handled in a manner that differs from that used by converitlonal data networks 
today, lii MAN. network transactions liave a header check sequence 626 (FIO. 20) {HCS) appended to the 

js header and a data, check sequence 646 (FIG, 20) (DCS) appended to f(ie entire network transaction. 

Consider the header flrsl The source Ulfvl generates a HCS before transmission to tie .source H\M. At 
the MINT the HCS is checked and, if in enror, the, transaction is discarded. Tfie destination Ntlvf performs a 
similar action- for a third time before routing ttie transaction to the destination UiM. This scheme prevents 
misdelivery of information due to corrupted headers. Once a header is found to be Hawed, nothing in the 

20 header can be considered reliable and the only option ttiat MAN has is to discard the transaction. 

The source UW is also required to provide a DCS at the end of the user data. Tnis field is ciiecked 
within the MAN network tiut no action is taken if errors are found. The information Is delivered to the 
destination UiM who can check it and take appropriate action. Its use within the network is to identify both 
EUSL and InterRai network problems. 

as Note thai thBre is never any attempt within the network to con-ect erros using the usual automatic repeat 
request (ARQ) techniques found in most of today's protocols. The need for low latency precludes this. Error 
correcting schemes would be too costly except for the headers, and even here the time penalty may be too 
great as has sometimes been the case In computer systems. However, header effor correction may be 
ennployed' later If ejjp^rience proves that it is needed and fime-wise possible. 

30 Consequently, MAN checks: for errors and discards transactions when there is reason to suspect the 
validity of the headerSi Beyond this, transactions, are delivered even if flawed. This is a reasonable approach 
for three reasons, first, intrinsic error rates over optical fibers are of the same order as error rates over 
copper when common ARQ protocols are employed. Both are in the range of 10"" bits per bit. Secondly, 
graphics applicatjons (which are increasing dramatically) often can tolerate small &mr rates where pixel 

35 images are transmitted; a fait or two per image would usually be fine. Finally, where error rates need to be 
better than the Intrinsic rates, EUS-to-EUS ARQ protocols can be used (as they are today) to achieve these 
improved error rates. 

40 S.5.2. Autiientication 

MAN provides an authentication feature, This feature assures a destination EUS of the identity of the 
source EUS for each and every transaction it receives. Malicious users canrtot send transactions with forgstS 
"signatures". Users are also prevented from using the network free of charge; all users are forced to 

iff identify themselves truthfully witi each and every transaction that they send Into the network, thus providing 
for accurate usage-sensitive billing- This feature also provides the primitive capability for other features such 
as virtual private networks. 

When an EUS first attaches to MAN, it "iogs in" to a well known and privileged Login Server that is part 
of the network. The login server is in an, administrative terminal 350 (FIQ, 15) with an attached disk memory 

so 351. The administrative terminal 350 is accessed via an OASM MINT processor 315 (FIG. 14) and a MINT 
OA&M monitor 317 in the MINT central control 20, and an OAaM central control (FIG. 15). This login Is 
achieved tiy the EUS (via its UIM) sending a login transaction to She server through the network. This 
transaction contains the EUS idenification num.ber {its name), its requested vltt-'ai network, and a password, 
!n:the NIM a port number is prefixed to .the transaction before it is forwarded to the MINT 'or routing to the 

ss server. The Login Server notes the id/port pairing and lnforms the MINT attached to the source I^IM of that 
pairing. It also acknowledges its receipt of the login to the EUS, telling the EUS that it may now use the 
network. 

When using the network, oach and every network transaction that it sent to the source NIM from the 



14 



EP 0 335 sea A2 



EUS has, within Ws. header, its source Id plus other information in the header described iDdow with respect 
to RQ. 20. Be NSM prefixes the port mmbm to the transaction and forwards it to the MINT where the 
pairing. Is ctiecked. Incorrect pairing resuits in the MINT discarding the transaction. !n the ivlINT, the 
prefixed source port numijer is repiacsd with a destination port number ttef ore it is sent to the destination 
5 Nlt^^f. The dsstinaflon NiM uses this dsstination port number to complete tiie routing to the destination EUS. 
if an EUS wisties to disconnect from the network, it logs off" In a manner similar to its login. The i.ogifi 
Server informs the MINT of this and the MINT removes the IcS/port information, thus rendering that port 
inactive. 

2.5,3^ Guaranteed (DrdBfing 

From NiM to HM the notion of a LUWU does not exist Even though LUWUs lose their identify within 
the t^lM to NIM envelope, the pacl<ets of a given t,UWU must follow a path tiiroipgh predetermined. XLs and 
T5 MINTS. Tills allows ordering of packets arriving at Uifvis to t3a preserved for a LUWU. However, packets 
may be discarded due' to flawed headers. The UIM checiss for missing packets and notifies the EUS in the 
event that this occurs. 

^° 2-5.4. Virtual Circuits and Infinite LUWUs 

The network does not set up a circuit through to the destination but rather switches groups of packets 
and SUWUs as resources become avaifabfe. This does not prevent the EUS from setting up virtual circuits; 
for example the EUS couid write an Infinite size LUWU with the appropriate, U(M timing parameters. Such a 

25 data stream would appear to the EUS as a virtual circijit while to She network it would be a never ending 
LUWU that moves packets at a time. The implementation of this concept must ije hgndied between the UIM 
and the EUS protocols since there may be many different types of EUS and UlMs. The end-user can be 
transmitting multiple data streams to any number of dsstlnau'ons at any one time. These streams are 
multiplexed on packet and SUWUs boundaries on the transmit link between the source USM and the source 

30 NIM. 

A parameter, to be adjusted for opMmum performance as the system is loaded, limits the time 
(equivalent to iimSting the length of the data stream) that one tvlINT can send data, to a NM m order fo free 
that NIM to receive data from other MINTs. An initial value of 2 milliseconds appears reasonable based on 
simulations: The value can be adjusted dynamically in response to traffic patterns in the system, with 
36 different values possi.isle for different MINTs or NlMs, and at different times of the day or different days of 
the week. 



3 SWITCH 

The MAN swtich (MANS) is the fast circuit switch at the center of the MAN hub. It interconnects the 
MINTs, and all end-user transactions must pass through it The MANS consists of the switch falDric itself, 
(called the data network or DNst), plus the switch control compiex (3CC), a collection of controliers and 
links that operate the DNet fabric. The WQ- must receive requests from the IvtlNTs to connect or disconnect 

45 pairs of :incoming and outgoing internal links (ILs), execute the requests when possible, and inform the 
MINTs of the outcome of their requests. 

These apparently straightforward operations must be carried out. at a high performance level. The 
demands- of the MAN switching problem are discussed in the next section. Next, Section 3.2 presents the 
fundanenfals of a distributed^control circuit-switched network that is offered as a liasis for a solution to such 

55 switching demands. Section 3,3. triors tt^s approach to the specific needs of MAN and covers some 
aspects of the control structure that are critical to high performance. 



3.1 Characterizing the Probiem 

Rrst we estimate some numerical values for the demands on «ie MAN switch. Nominally, the MANS, 
must establish or remove a transaction's connection In fractions of a millisecond !n a network with hundreds 
of ports, each running at 1 SO Mb/s and each carrying thousands of separately switched transactions per 



15 



EP 0 335 362 A2 



second. Millions of transacfion requests per second imply a distributed control structure where numerous 
pipelined wntrollers process transaction requests in parallel. 

Ths combination of so many ports each running a higti speed has several implications. First, the 
bandwidth of tJie net^vork must tie at lea$t 150 Gb^, thus requiring multiple data paths (nominally 150 Mb/s) 
through Ehe network. Second, a 150 Mb/s synchronous network would b© difficult to build (although an 
asynchronous network needs to recover clock or phase). Third, since inband signaling creates a more 
Gompiex (self-routins) network fabric and requires birffaring within Ihe network, m out-of-band signaling 
(separata control) approach Is desirable. 

In MAN, transaction lengths are expected to vary by several orders of magnitude. Tliese transactions 
can share a single switch, as discussed berainsfter with adequate delay perfomnahcs for small transactions, 
The advantage of a single fabric i$ that data streams do not have to be separated before switching and 
recombined afterwards. 

A problem to be dealt with Is the condition where the requested output port is busy. To set up a 
connection, the given input and output ports must: be concurrently idle (the so-called concurrency problem). 
If sn idle input (output) port waits for the output (input) to become idle, ths waiting port is inefficiently 
utilized and' other transactions needing that port are delayed. If the idle port is instead given to other 
transactions,, the ori^nal busy destination port may have become idle and busy again In the meantime, thus 
adding further delay to the original transaction. The delay problem is worse when the port is busy with a' 
large transaction. 

Any concurrency resolution strategy requires that each port's busy/idle status be supplied to fhe 
controllers concerned with it, To maintain a high transaction rate, this status update mechanism must 
operate with short delays. 

If transaction times ars short and most delays are caused by busy ports, an. absolutely non-blocking 
network topology is trat required, but the blocking probability should be small enough so as not to add 
much to delays or burden the SCO with excessive unachievable connection rec|uests. 

Broadcast <one to many) connections are a desirable network capability. However, even if the network 
supports broadcasting, the concurrency problem (here even worse with the many ports Involved) must be 
handled withoui disrupting other traffic. This seems to mle out ths simple strategy of watting for ail 
destination ports to become idle and broadcasting to all of them at once. 

Regardless of the special needs of ttie MMi network, the MANS satisfies the general requirements for 
any practical network. Startup costs are reasonable. The network is growabte without disrupting existing 
fabric. The topology is inherently efficient in Its use of fabric and circuit boards. .Finally, the concerns of 
operational availability - rettability, fault tolerance, failure-group sizes, and ease of diagnosis and repair - are 
met. 



3j general Agproagi - A Distfubuted-Controt Circuit-Switching Network 

in tills section we describe ths basic approach used in the MANS. It specifically addresses the means 
by which a large network can be run by a group of controllers operating in parallel and independently of 
gne another. The distributed control mechanism is described in terms of two-stage networks, but with a 
scheme to extend ths approach to multistage networks. Section 3.3 present details of the specific design for 
MAN. 

A major advantage of our approach is that the plurality of network controllers operate independently of 
one another using orjiy local information. Throughput (measured in transactions) is increased because 
controllers do not burden each other with queries and responses. Also the delay in setting up or tearing 
down connections iS: reduced because the number of sequential control steps is minimized.: AH this is 
possibia because the network fabric is partitioned into disjoint subsets, each of which is controlled solely by 
its own controller that uses global static information, such as the internal connection pattern of ihe data 
network 120, but only local dynamic (network state) data, Tfius, each controller sees and handles only those 
connection requests that use the portion of the network for which it is responsible, and monitors the state of 
only that portion. 



3.2.1 Partilioning Two-Stage l^lworks 

Consider the 9 x 9 two-stage network example in FIG. 6 comprising three input switches iSI (101), IS2 
(■502), and IS3 (103), and three output switches OSl (104), 0S2 (105); and 0S3 (106). We can partition its 



EP 0 335 562 A2 



fabric into three disjoint subsets. Each subset includes the fabric in a given second stag$ switch {03,=) S>ius 
the fabric (or crosspoints) ln the first stage switches (tSy) that conrisc) to the ltnl<s going to that second 
stage swStch, For example, in FIG. 6, the partition or subset associated with OS; (104) is shown by a 
dashed line around the crosspoints in OS, pSus da^ed lines around three crosspoints in esch of Bie first 

s stage switches {1 01 ,1 02.1 03) (those crosspoints being those that connect to the iinks to OSt ). 

Now, consider a controller for this subset of the network. It would be responsible for connections from 
any inlet to any outlet on OSi. The controller would niaintain busy/idie status for the crosspoints it 
controlled. Ttiis informstion is clearly enough to tell whether a connection is possible. For exampSe, suppose 
an' inlet on !S'i is to be connected to an outlet on OS,, Ws assume that the request is from the inlet, which 

JO must be idle The outlet can be determined to be idle from outlet busy/idle status memory or else from the 
status of the outlet's three crosspoints in OS, (all three must be idle), Next, the status Of the link between 
IS, and OS, must be checked. This lintc will be idle if the two crosspoints on both ends of the link, iprtiich 
connect the link to the remaining two inlets and outlets, are al) idle. If the inlet ou^et. and link are alt idle, a 
crosspoint In each of iS, and OS, can be closed to set up the requested connection. 

n Note that this activity can proceed independently of activities in the other subsets (disjoint) of the 
network. The reason is that the network has oniy two stages, so the inlet switches may be partitioned 
according to their links to second stage switches. In theory this approach applies to any two-stage network, 
but the usefutneES of the scheme depends on the network's blocking characteristics. Tlie network in FIG, 6 
would block teo frequently, because it can connect at most one iniet on a given irilet swstch to an outlet on 

zo a given second stage switch. 

A two-stage network, refen-ed to hereinafter as a Richards network, of the type described in G. W. 
Richards eta at,; "A. Two-Stage Rearrangsalile Broadcast Switching Network, IEEE Transactions on 
Communications, v. COM 33. no. 10, October 1S8S, avoids this problem by wiring each inlet port to multiple 
appea ' ra nc e slpread over different inlet switches. The distributed contro! scheme operates on a Richards 

3S network, even though MAN irtay not use such Richards network features as broadcast and rearrangement. 



3.2.2 Control Network 



3.2.2.1 Function 

In MAN, requests for connections come from inlets, actually, the central control 20 of the MlNTs. These 
requests must ba distributed to the proper switch controller via a control network (CN©t). ln FIG. 7, both: the 

3s ONet 120 for circuit-switched transactions and the control CNei 130 are shown. The DNet is a two-stage 
rearrangeably non-blocking Richards network. Each switch 121,123 includes a rudimentary crosspoint 
cDntroller (XPC) 122,124 which' accepts commands to connect, a specified inlet on the switch to a specified 
outlet Sy Closing the proper crosspoint. The first and second stages' XPCs (121.123) are abbreviated 18C 
(first stage controlisr) and 2SC (second stage controller) respectively. 

40 On the right side of She CNet are S4 MANS controllers 14D (MANSCs) corresponding to and ijontrollmg 
64 disjoint subsets of the DNet, partitioned by second stage outlet switches as described earlier. Since the 
controllers and their network are overlaid on She ONet and not Integra! to the data fabric, they couH be 
replaced by a singfe. controller in applications where transaction throughput is not critical. 



3^_.£.2 Strucitiire 

The CNet shown in F!G. 7 fias special properties. It consists of three similar parts 130,134,135. 
corresponding to flows of messages from a MINT to a MANSC, orders frdm a MANSC to an XPC, and 
acknowledgments or negative acknowtedgments ACKs/NAKs from a MANSC to a MINT: ackriov/tedge 
(ACK), negative acknowledge (NAK). Each of the networks 130,134 and 135 is a statistically multiplexed 
tirrie-divislon switch, and comprises a bus 132. a group of interfaces 133 for buffering control data to a 
destination or from a source, and a bus arbiter controller (BAG) 131. The bus arbiter controller controls the 
gating of control data from an input to the bus. The address of the destination selects the output to which 
ths bus is to be gated. The output is connected to a controller (network 130; a MANSC 140) or an interface 
(networks 131 and 132, interfaces similar to Interface 133). The request inputs and ACK/.NAK responses are 
concentrated iiy control data concentrators end distributors 136,138, each control data concentrator 
concenU-atine data to or from four MiNTs. The confrot data concentrators and distributors simply buffer data 



17 



EP 0 335 562 A2 



from or to (he MMTs. The interfaces 133 in the CNet handle statistical demtillipleMng and multiplexing 
(steering and merging) of control' msssages. Note thai the interconnections made by bus 132 for a given 
request message in the DNet are tiie same as those requested in the CNet. 



3.2 j Connection Request Scenaflo, 

The connection request scenario begins witti a connection request message arriving at the left of CNet 
130 in a multiplexed stream on one of the message input lini<s 137 Srom one of the data concentrators 136. 
10 This request includes the ONet 120 inlet and outlet to be connected. In the CNet 130, the message is 
routed to ttie appropriate link 139 on the right side of the CNet according to the outlet to be connected, 
which is uniquely associated with a particular second stage switch and therefore also with a particular 
MANS contrtilier 140. 

This MANSC consults a. static global directory (such as a ROM) to fmd which first stage switches carry 
,s the requesting inist. Independently of other MANSCs, it now checics dynamic local data to see whether the 
ousiet is idle and any links from the proper first stage switches are idle. If the required resources are idle, 
the WIANSC sends a crosspoint connect order to its own second stage outlet switch p'm another order to 
the proper first stage switch via network 134. The latter .order includes a header to route it to the connect 
first stage. 

so This approach can achieve extremely high transaction throughput for several reasons. Ail network 
cortroliers can operate in paratlei, independently of one another, and need not wait for one another's data or 
go-aheads. Each controller sees' only those requests for which it is responsible and does not waste time 
with other messages-; Each controller's operations are inherently sequsntlal and independent functions and 
thus may be pipelined with more Bian one request In progress at a time. 

25 The above scenario is not the only possibility. Variabtes So be considered include broadcast -ys- point- 
lo-poini inlets, outlets -vs- intet'orlented connection requests, rearrangement -vs- blocking-allowed opera- 
tion, and disposition of blocked or busy connect requests. Alth.ougii, these choices are already settled for 
MAN. all these options can be handled with the control topology presented, simply by changing the logic in 
the MANSCSv 



3.g.4 Multistage Networks 

This control structure Is eHfendibla to multistage Richards networks, where switches in a given stage 
are recursively implemented as two-stage networks. The resultant ONet is one in which connection requests 
pass sequentially through S-1 controllers in an S-stage network, whore again, controllers are responsible for 
disjoint subsets of the network and operate Independently, thus retaining the high throughput potential. 



3.3 Spedfic Design for MAN 

In this section we first examine those system attributes that drive the design of the MANS, Next, the 
data and control networks are described, Rnail/ the functions of the MANs controllsir are discussed in 
detail, including design tradeoffs that affect perfoitnance. 



3.3.1 System Atiributes 



^° 3-3:1 -1 External and internal InterfacBS 

Fi(3. 7 illustrates a prototypical fully-grown MANS composed of a DNet: 121: with 1024 incoming and 
1024 outgoing ILs and CNet 22 comprising three control message networks 130,133,134 each with .64 
incoming and 64 outgoing message links. The lis are partitioned into groups of 4, one group for each of 
56 256 MINTs. The DNet is a two-stags network of 64 first stage switches 121 and S4 second stage switches 
123. Each switch includes an XPC 122 that takes comm.ands to open and close crosspoints. For each of the 
DNet's 54 second stages 123, there is an associated MANSC 140 with a dedicated controi link to the XPC 
124 in its second stage switch. 

ia 



BP 0 33S 562 A2 



Each controi link and status, link Inlerfaces 4 MINTs to the CNet's !eft-to-right and right-to-teft switch 
pianes via 4:i control data concentratoi's and disiributora 136,138 which are also part of the.CNet:22. These 
may bs regarded either m remote oortcenirators Sn each 4-MINT group or as parts of their associated 1:64 
CNst 130,135 stages; in the present embodiment, they are part of fh@ CNet. A third 64x64 plane 134 of the 
5 CNst gives aach MANSC 140 a dedicated right-to-left Interface 133 with one linic to each of tha 64 1SCs 
12S. Each- MINT 11 interfaces with the MANS 10 through its four ILs 1.2, its request signai to control data 
concentrator 136, and the aknowledge signal received back from control data distributor 13S. 

Aiternately, each CNet could have 256 instead of 64 ports on its MiNT side, eiimlnating the concentra- 
tors, 

IP 

3.3.1.2 Size 

The fi/lANS diagram In FlQ. 7 represents a network needed to switch data traffic for up to 20,000 EtiSs. 
IS Each NiM is expected to handle and corsoentrate the traffic of 10 to 20 EUSs onto a 150 Mb/s XL. giving 
about 1000 XLS (rounded off in binary lo 1024). £ach MINT serves 4 XLs for a total of 256 MiNTs, Each 
MINT also handies 4 ILs, each with an input and an output termination on tha ONet portion of the MANS. 
The data network Ihus has 1024 inputs and 1024 outputs. Internal DNet link sizing will be addressed later. 
Failure-group size and other considerations lead to a DNet with 32 input links on each first stags switch 
20 121, each of which links is connected to two such switches. There are 15 'outputs on each second stage 
switch 123 of the DNet. Thus, there are 64 of each type of switch and: also 64 MANSCs 140 in the CNet, 
one per second stags switch. 

3.3-1.3 Traffic ' a n d Consolidation 

The "natural" EUS transactions of data to be switched vary in size by several orders Of magnitude, from 
SUWUs of a few hundred bits to LljWUs a megabit or more. As explained in Section 2.1.1, MAN breaks 
larger EUS transactions into network transactions or packets of at most a few thousand bits each. But the 

30 MANS deais with the switcfi transaction, defined as the burst o( data that passes through one MANS 
connectiorr per one connect (and disconnect) request. Switch transactiorts can vary in s'ao from a single 
SUWU to several LUWUs (many packets) for reasons about to S)e given,. For the rest of Section 3, 
"transaction" means, "switch transaction" except as noted. 

For a given total data rate through the MANS, the transaction Uwughput rate (transactions/second) 

35 varies inversely with the transaction size. Thus, the smaller the transaction size, the greater the transaction 
throughput must be to maintain the dala rate. This throughput is limited by the individua! throughputs of the 
MANSCs (whose connect/disconnect prcKessingi delays reduce the effective II bandwidth) and aiso by 
concurrency resolution (waiting for busy wtlets). Each MANSCs overhead per transaction is of course 
independent of transaction size. 

40 Although larger transactions reduce the transaction throughput den^ands, they will add more delays to 
other transactions by holding outlets and fabric paths for longer times, A compromise is needed ■- small 
transaotons reduce blocking and concurrency delays, but large transactions ease the MANSC and MINT 
workloads and improve the DNet duty cycle. The answer is to let MAN dyrlamically adjust Its transaction 
sizes under varying loads for the best performance; 

The DNet is large enough to handle the offered l.cpad, so the switching control cpmpiex's (SCO) 
throughput is the limiting factor. Under light traffic, the switch transactions will be short, mostly single 
SUWUs and packets. As traffic levels increase so does the trar>saction rate. As tha SCC transaction rate 
capacity Is approached, transaction sizes, are dynamically increased to maintain the transaction rate- just 
below the point where the SCC would overload. This is achieved automatically by the consolidation control 

so strategy, whereby each MINT always transmits in a single switch transaction all available SUWUs and 
pacfcete targeted for a given destination, even though each burst may contain the. whole or parts of several 
eus transactions. Further increases In traffic will increase the size, but not so much the number, of 
transactlbns. Thus fabric and !L utilization improve with load, while the SCC's workioad increases only 
slightly. Section 3.3.3.2.1 explains the feedback mechanism that controls transaction siae. 

eS 

3.3^1 .4 Performance Goals 



19 



EP 0 33S 562 A2 



Nevertheless, WAN's data throughput depends on e>ctrem9ly high performance of individual SCC control 
elements. For exampte. each XPC 122,124 in th? data switch will be ordered to set and clear at least 
67,000 connections per second. Clearly, sach request must bs handled in at most a few microseconds. 

Likewise, the MANSCs' functions must be done quickly. We assume that these step$ will be pipelined; 
6 then the sum of the step processing times wiH contribute to connect and disconnect delays, and the 
maximum of these- step times wilt limit transaction throughput, ai.rn to hold the maximum and sum to a 
few microsecortds and a few tens of microseconds, respectively. 

The resolulbn of the concurrency problem must also be quick and efficient. Busy/Idle status of 
destination terminals will have to be determined in about 6 microseconds, and Jhe cdntro! strategy must 
JO avoid burdening MANSCs with ynfulfitlabie conneciion requests, 

One final performance issue relates to the CNet iis&i. The rtetwori< and its access lini<s must run at high 
speeds (probably at least 10 Mb/s) to i<eep contol, messiage transmit times small and so that links will run 
at low occupancies to minimize the contention delays from statistical multiplexing, 



3.3.2 Data Network (ONet) 

The DNet is a Richards two-stage rearrangeably non-biocking broadcast network. This topology was 
chosen not so much for its broadcast capability, but because its two-stage structure allows the network to 
20 be partitioned into disjoint subsets for distributed control. 



3.3.2.1 Design Parameters 

25 The capablilties of the Richards network derive from the assignment of inlets to multiple appearances 
on different first stage switches according to a definite pattern. The particular assignment pattern chosen, 
the number m of'mulfiple appearances per inlet, the total number of inlets, and, Ihe number of links between 
first and second stage switches determine the maximum number of outlets per second stage switch 
permitted: for the network to be rearrangeably non-blocking. 

30 The ONet In FiG. ? has 1024 inlets, each with, two appearances on the first stage switches'. There are 
two linl<s: between each first and second stage switch. These parameters aiong with the pattern of 
distributing the inlets ensure that with 16 outlets per second stage switch the network wil! be. rearrangeably 
non-islocking for broadcast. 

Since fi/AN does not use broadcast or rearrarvgement. those parameters not justiiied by faiSure-group or 

3S other considerations may be changed as more experience is obtained. For example, it a failure group sise 
of 32 were deemed tolerable, each second stage switch could have 32 outputs, thus reducing the number 
of second stage switches, by a factor of 2. Making such a change would depend on the ability of the SCC 
control elements each to handle twice as much traffic, in addition, blocking probabilities would increase and 
it '/rould have So ba determined that such an increase would not significantly detract from the performance 

JO of the network. 

The network iias 64 first stage switches 121 and 64 second stage switches 1 23. Since sach inlet has 
i'//o appearances and there are two links between first and second stage switches, each first stage switch 
has 32 inlets and 123 outlets and each second stage has 128 inlets and 16 outlets, 



3.3.2.2 Operation 

Since e^ch inlet has two appearancss and since Uiere are two links between each tirst and second 
stage switch, any outlet switch can access any inlet on any one of four links. The association of injets to 
50 iinks is algorithmic and thus may bs computed or alternatively read from a tabSe, The path hunt involves 
simply choosing anvidle link (if one exists) from among ,ilho four link possibilities, 

If none , Of the four links is idle, a re-attempt to make a connection is made iatsr and is requested by the 
same iVllNT. AHernatively, existing connections could be re-an-anged to : remove the blocking condition, a 
simple procedure in a Richards netw.ork. However, rerouting a connection in midstream could introduce a 
55 phase giitch beyond the ouiiel circuit's ability to recover phase and clock. Thus with present circuitry, it is 
preferable not to run the H/lAi\SS as a rsK-rangeable switch. 

Each switch in the DNet has an XPC 122,124 on the CNet, which receives messages from the MANSCs 
teliing which crosspoints to operate. No high-level logic is performed by these controilers.. 



20 



EP 0 335 562 A2 



3.3.3 Pontro! Network and MANS Controller Functions 



3;3.3.1 Contrpj Nstwor k {CNet) 

The CNet 130,134,135 briefly described earlier, intwoonrjects the MlHTs, MANSCs, and lSCs. it must 
carry threa types of .messages -connect/disconnect orders from MINTs to MANSCs using btook 130, 
Cfosspolnt orders from d/IANSCs to 1SCs using' biook 134, and ACKs and NAKs from MANSCs back to the 
MINTs usmg block 135. The CNet shown in FIQ. 7 ba$ Ihre© corresponding planes or sections.. The private 

fo MANS 140-2SC 124 links are show but are not considered part of the CNet as no switching is required. 

\!) this embodiment, the 253 MlNTs access the CNet in groups of 4, resuSflng in 54 input paths to and 
64 output paths from the network. The bus oiements In m control network perform merging and routing of 
message streams, A request message from a MINT includes ths ID of tho outlet port to be connected or 
disconnected. Since file MAMSCs are associated one^o-ons with second stage switches, this outlet 

;s specification identifies the proper MANSC to which the message is routed. 

The MANSCs transmit acknowlsdgmem (ACK), negative acknowledgment (NAK). and iSC command 
messages via the right-to-left portion of the CNet (blocks 134,135). These messages will also be formatted 
with header intormation to route the messages to the specified MtNTs and ISCs. 

The CNet and its. messages raise significant technical dhallenges. Contention problems in the Cl^iet 

2D may mirror shose of the entire MANS, requiring their own concurrency solution. These are apparent in the 
Contrqi Network shown in FIG. 7. The control data concentrators 136 from four lines into one interface may 
have contention where more than one message tries to arrive at one time. The rJata concentrators 136 have 
storage for on© request from each of the four connected MlNTs, and tt>s MlNTs ensure that consecutive 
requests are .sent sufficiently far apart that the previous request from a P^IINT has already ; been passed on 

£S by the concentrator before ths next arrives. The MiNTs time out if no acknowledgement of a request is 
received within a prespecifled time. Alternatively, the control data concenirators 13a could simply "OB" any 
requests received on any Input to ttie output; garbbd requests would be Ignored and not acknowledged, 
leading to a time out. 

Functionary what is needed inside the blocks 130,134,135 is a micro-LAN specialized for tifiy fixect- 
so length packets and tow contention and minimal .deiay, Ring nets are easy to interconnect, grow gracefully, 
and permit simple tokenless add/drop protocols, but they are Ill-suited for so many closely packed nodes 
and have intolerable end-to-end delays. 

Since the longest message {a MINT'S connect order) has under 32 bits, a parallel bus 132 serves as a 
CNet fabric that can send a complete message in one cycle. Its arbitration controller 131, in handling 
35 contention for the bus. would automatlcatiy solve contention for the receivers. Bus components are 
duplicated for reliability (not shown}. 

Mm Swjteh CQntrotler (MANSC) Operations 

FIQS. 8 and 9 show a flowchart of the MANSCs high level functions. Messages to each MANSC 140 
include a connecl/disconnect bit, SUWU/packet bit, and the IDs of the MANS input and output ports 
involved, 

3.3.3.2.1 Bequest Queues; Consolidation (Intake Section. Ftp. 8) 

Since the rat6 of message arrivals at each MANSC 140 csn exceed its message processing rate, a 
MA.NSC provides entrance queues for Its messages. Connect and disconnect requests are handled 
50 separately. Connects are not enqueued unless their requested outlets are idle. 

Priority and ragul^ packet connect messages are provided separate queues 150,152 so that priority 
packets can be given higher priority. An entry from the regular packet cueue 152 is processed only it the 
priority queue 160 is empty. Thfs mintmtzes the priority packets' processing delays at Bis expense of the 
regular packets', but It is estimated that priority traffic will not usually be heavy enough to add much to 
5S packet delays. Even so, delays are likely to be more user-tolerable with Sie lower priority large data 
transactions than with priority transactions. Also, if a packet is one of many pieces of a LUWU, any given 
packet delay may have no final effect since end-to-end LUWU delay depends only on the last packet 

Both the priority and regular packet queties are short, intended only fo cover short-term random 

21 



BP 0 33S S62 A2 



fluctuations in message arrivals.. Sf the short-term rate of arrivals exceeds ths MANSC's processing rate, the 
regular packet queue and perhaps the priority queue will ovsrilow. In such cases a control negative 
acknowledge (CNAK) is returned to the requesting MINT, indicating a MANSC overload. This is not 
catastrophe, but rather the feedback mechanism in the consolidation stratggy that Increases switch 
trsnssction sizes as traffic gats heavier. Each MINT combines into one transaction all available packets 
targeted for a given DNet outlet Thus, if a connection request by th© MINT results iri a CNAK, the next 
request for the same destination rriay represent more data to be shipped during the connection, provided 
more packets of the LUWUs have arrived at the MINT in the meantime. Consolidation need not always add 
to LUWU transmission delay, since a LUWU's last packet might not be affected. This scheme dynamicaliy 
increases effective packet (transaction) sizes to accommodate the processing capabiiity of the. MANSCs, 

The priority queue is longer than the regular packet queue to reduce the odds of sending a priority 
CNAK due to random bursts of requests. Priority packets are less likely to benefit from consolidation than 
packets recombining Into their original LUWUs: this supports the separate, high-priority queue. To force the 
MINTS to consolidate more packets, we may build the regular packet queue shorter than it "ought" to be, 
SImuiafions have indicated that a priority queue of 4 requests capacity and a regular queue of a requests 
capacity is appropriate. The sizes of both queues affect system performance and can be ftne-tunsd with 
real experience with a system. 

Priority is determined by a priority indicator in the type of service indication 623 (F!G. 20). Voice 
packets are given priority because of their required low delay. In al.teimativs arrangements, all single packet 
transactions (SUWUs) may be given priority. Because charges are likeiy to be higher for high priority 
service, users will be discouraged from demanding high priority service for the many packets of a long 

tuwu. 



3.3.3.2.2 Busy/Idle Check 

When a connect request first arrives at a MANSC, it is detected in test 153 which differentiates it from a 
disconnect request. The busy/idle status of the destiriation outist is checked (test 154). If the destination is 
busy, a busy negative acknowledge (BNAK) Is returned (action 15S) to the rec^uestlng MIfviT, which witi: try 
again later. Test 158 selects the proper queue, (priority or regular packet). Th© queue is tested (160.162) to 
see if it: is full. If the specified queue is full, a CNAK (control negative aknowledge) is returned (action 164), 
Otherwise the request Is enqueued in queue 1,50 or 152 and simultaneously the destination is seized 
(marked busy) (action 166 or 167). Note that an overworked (ftiii queues) MANSC can still return BNAKs. 
and that both BNAKs and CNAKs lend to increase transaction sizes through consolidation. 

The busy/Idle check and BNAK handie the concun'ency problem. The penalty paid for this approach is 
!hat a MfNT-to-fi/lANS iL is unusable during She interval between a MINT'S issuing a connect request for that 
IL and its receipt of an ACK or BNAK. Also the GNetjams up with BNAKs and falling requests ur^der heavy 
MANS loads. Busy/idle checks must be done quickly so as not to degrade the connection request 
throughput and IL utilization; this explains the perfonnance of a busy test before enqueuing. It may be 
jjesirabie further to use separate hardware to pre-test outlets for concunrency. Such a procedure would 
relieve th© MANSCs and CNets from repeated BNAK requests, increase the successful request: throughput, 
and permit M Mms to saturate at a higher percentage 01 its theoretical aggregate bandwidth. 



3.3.3.3.3 Path Hunt - MANSC Seryjce gggHon (Fig- 9) 

Priority block 168 gives highest priority So requests from disconnect queue 170, lower priority to 
requests from the priority queue 150, and lowest priority to requests from the packet queue 152. When a 
connect request is unloaded from the priority or the re^lar packet queue, its requested outlet port has 
already been seized earlier (action 166 or 167). and the MANSC hunts for a path through the DNet This 
merely involves looking up first the two inlets to which the incoming IL is connected (action 172) to find the 
Sour links with access to that incoming IL and checking their busy status (test 174). If all four are busy, a 
bSocked-vfabric NAK (fabric NAK or FNAK) fabric blocking negative acknowledge (FNAK) is returned to the 
requesting MINT, which wiil try the request again iatsr (action 178). Aiso the seized destina^on outiet is 
released (marked idle) (action 176). We expect f^i^AKs: to be rare. 

If the four links are not all busy, an idle one is chosen and seized, first, a first stE^s '"'eL then a link 
(actSon ISO); both are marked busy (action 182). The Inlet and link choices are stored (action 134). Now the 
MAN$C uses its dedicated control path to send a crossooint connect order to the XPC in its associated 

22 



EP 0 335 562 A2 



second, stage switch {action 188}; this connects the cl^osan link to tlie outlet. At the same time another 
crosspoinl order is sent (via the right-to-lett CNet plane 134} to ths ISC (action 1^6) required to connect the 
Ilni< to the inlet port. Ones this order arrives at the ISC (test 190), an ACK is returned to tfie originating 
MINT (action 192). 



3,3.3.g.4- Disconnects 

To release network resources as quIcStiy as possible, disconnect requests are handled separately from 
w connect requests and at top priority. They have a separate queue 170, built 16 words iong (same as the 
number ot outlets) so It can never overflow, A aisconnect \& detected In test 153 which receives requests 
from the MINT and separates corinect from disconnect requests. The outlet is released and the request 
placed in disconnect queue 170 (action 193). Now a new connect request for this same outlet can be 
accepted even though the outlet is not yet ptiysiceliy disconnected. Due to its higher priority, the 
IS disconnect will tear down the switch qonnections before the new i-equest tries to reconnect ttie outset. Once 
snqiteued, a disconnect can always be executed, Only the outM !D is needed to Identity the spent 
connection; the MANSC recalls this connection's cHoic? of iinlc and crosspoints., from local memory (action 
185), marks these links idle (action 198) arid sends the two XPC orders to release them (actions 186 at^d 
185)' Thereafter, test 190 controls the wait for an acl<nowiedgffl8nt from the first stage controller and the 
zo ACK is sent to the MINT {action 192). tf there is no record of this connection, the MANSC returns a "Sanity 
NAK." The muse senses status from Ih.e outlet's phase alignment and scramble circuit (PASC) 2B0 to 
verify that some data transfer took place. 



26 ' 3.3.3.2,5 Faraljgi Pspeiining 

Except for seizure and release of resources, the above steps for one request are independent of other 
requests' steps in the same MAN8C and thus are pipelined, to increase MANSC throughput. Stili more 
power Is achieved througij paraiiai operations; the path hunt begins at the same time as the busy/idle 
30 check. Note that the transaction rate depends on the lor?ge5t step in a pipelined process, but the response 
time (or one given transaction (from request to ACK or NAK) is the sum of the step times Involved. The 
latter is- improved by parallelism but not by pipelining. 



3.3.4 Error Petection and Diasnosis 

Costly hardware, message bits, and Ume-wasting protocols to the CNet and its nodes to verify every 
little message are avoided. For example, each crosspoiiit order from a umBC to an XPC does not require 
an echo of the command or even an ACK in return. Snstead, mmCs does assume that messages arrive 
uncorrupted and are acted oh correctly, until evidence to the contrary arrives frbrrj outside. Audits and 
cross-checks are enabted oniy when thefe is cause for suspicion. The end users, NIMs and MlNTs soon 
discover, a defect in the MANS or its control complex, and identify the subset of ivIANS ports involved. Then 
the diagnostic task Is lo isolate the problem for repair and interim work-around. 

Once a portion of the MANS is suspect, temporary auditing modes could be turned on to catch the 
guilty parties. For suspected ISCs and MANSC, these modes require use of the command ACKS and 
echoing, Spscial messages such as crosspoint audits may also be passed through the CNet. This should 
be done while stili carrying a light toad of user traffic- 
Before engaging these tntemai self-tests {or perhaps to . eliminate them eritlrely), MAN can run 
experiments on the MANS to pinpoint the failed circuit, using the S^lNTs, iLs, and mMs. For example, if 
75% of the: test SUWUs sent from a given IL make it to a giver! outlet, we would conclude that one of the 
two links from one of that lUs two first stages is defective. (Note this test must be run under load, iest the 
deterministic MANSC always select the same link.) Further experiments can isolate that link. But if several 
MINTS are tasted and none can send to a particular outlet, then that outlet is marked "out of service" to ail 
Mmts and suspicion is now focussed on thai second stage and its MA^fSC, If other outlets on that stage 
work, the fault Is in the second stage's fabric. These tests use the status lead from each of a MANSC's IS 
PASC. , 
Coordinating the independent !i4tNTs and NIMs to njn these tests lequiras a central intelltgencs with 
low«bandwidth message links to al! MlNTs and HMs. Given inter-fvlINT connectivity (see FIG. IS), any 



23 



BP 0 335 562 A2 



MINT with «be needed (Irmyvare, can take on a diagnostic task. NIMs must be involved anyway to teil 
wheshsr lest'SyWUs reach their destinations. Of course ariy N!M on a worl<in9 MiHT can sxchanga 
messages witti any other sudi NIM. 

3.4 MAN Switc h ControHer 

FIG. 25 is a diagram of MANSC 140. This is tha unit which sends controi instructions to data network 
120 io set up or tear down circuit connections. It receives orders from control networit 130 via (ink 139 and 
sends acl(nowtedgments both positive and negative bacK to the requesting Ml NTs 11 via controi fietworl< 
135. It aiso sends instructions to first stage switch controllers via control network 134 to lirst stage switch 
controller 122 and directly to the Second stage controller 124 that is associated with tha specific MANSC 
1<«. 

Inputs are redeived from iniet 139 at a request Intake port 1402. They are processed by Intake control 

1404 to sea if the requested outlet is busy. The outlet memory 1406 contains busy/idle indications of the 
outlets for wiilcfi an MANSC 140 is responsibie. If the ouKet is Idle a connect request is placed into one of 
two queues 150 and i5.3 previously described with respect to I^IG, 8. if .{he request is for. a disconnect, the 
request is piaced in disconnect queue 170. The outlet map 1406 is updated to mark a disconnected outlet 
idle. The acknowledge response unit 140B Sends negative acknowledgments if a request is received with an 
error or if a connect request is made to a busy outlet or if the appropriate queue 150 or 152 is full. 
Acknowledgment responses are sent via control network 135 back to the requesting MINT 11 via distributor 
138. All of these actions are performed under the controi of intake control 1404. 

Service control 1420 controls the setup of paths in data network l20 and the updating of outlet memory 

1405 for those circumslancea in which no path is available in the data network between !he requesting input 
link and an available output link. The intake control also updates outlet memory 1406 on connect requests 
so that a request which is already in the queue will block another request for the same output link. 

Service control 1420 examines requests in the Uiree queues 150, 1S2. and, 170. Disconnect requests 
are always given She highest priority. For disconnect requests, the link memory 1424 and path memory 
1 426 are examined to see which links should be made idle. The instructions for idling these links are sent 
to first stage switches from first stagte switch order port 1428 and the instructions to second stage switches 
ars sent from second stage swtich order port 1430. For connect requests, the static map 14?2 is consuiied 
to see which links can be used to set up a palfi from the requesting input link to (he requested output link. 
Link map 1424 is ihen consuitsd to see if appropriate tinks are available and if so these links are marked 
busy. Path memory 1426 is updated to show that this path has been set up so that on a subsequent 
iHsconnect order the appropriate links can be made idle. All of these actions are performed under the 
control ot service controi 1 420. 

Controlters 1420 and 1404 may be a single controiler or separate controllers and may be program 
oontrclied or controlled by sequential logic. There is a great need for a very high-speed operations in these 
controiiers because of the high throughput demanded which makes a hard wired controller preferBt}le. 



a.6 Control Network 

Control message network 130 (FiG. 7) takes outputs 137 from data concentrators 136 and transmits 
these outputs, representing connect or disconnect requests, to MAN switch controllers 140. Outputs of 
concentrators 138 are stored temporarily in source registers 133. Bus access controlier 131 polls these 
scurce registers 133 to see if any have a request to be tiansmitted. Such request? are then placed on bus 
ISSiWhose output is stored temporarily in intermediate register 141. Bus access controller 131 then sendS: 
outputs from register 141 to the appropriate orje of the MAN switch controllers 140 via link 139 by placing 
the output of register 141 on bits 142 connected to link 139. The action is .accomplished in three phases. 
During the first phase, the output of register 133 is placed on the bus 132, thence gated to register 141. 
During the second phase, the output of register 141 is placed on bus 142 and delivered to a MAN switch- 
controller 140. During the third phase, the IvIAN switch controller signals the source register 133 as to 
whether the controller has received the request; if so, source register 133 can accept a new input from 
controi data concentrator 136. Otherwise, source register 133 retains the same request data and the bus 
access controller 131 will repeat the transm.ission later. The three phases may occur simultaneously for 
three separate requests, Controi networks 134 and 135 operate in a fashion similar to control network 130. 



24 



EP 0 33S 562 A2 ^ 



3.8 Summary 

A ssruotiira to meet tha larga bandwidth, and u-ansactiofi Sh.roughjJuE requiremants for the MANS has 
been described. The data sv^ilch fabric is a two-stage Richards network, chosen because Its low blocking 
5 probability permits a sjarallel. pipelined distributed switch control complex (SCC). The SCC includes XPCs 
in atlvfifst ar^d. second stage $witches, an intelligent controller MANSC with each second .stags, and the 
CNet that Bes ths control pieces together and links them to she Ml NTs. 



4 MEMORY AND INTERFACE MpjUUE 

The memory and interface module (MINT) provides receive interfaces for the external fiber-optic links, 
buffer memory, controi for routing and linf< protocols, and transmitters to send coilected data over the Imks 
to the MAN switch. In the present design, .each MINT serves fotir network Interface modules (NiMs) and has 
four links to the switch. The MINT is a data switching module. 



4.1 Sasic Punctlons 

Tfie basic functions of the MINT are to provide. the folbwing: 

1. A fiber-optic receiver and link protocol handier ior each NIM. 

2. A link handler and transmitter for each link to the switch. 

3. A buffer memory to accumulate packets awaiting transmission across the switch. 

4 Ah Interface to the controller for the switch to direct the setup and teardown of network paths. 
S. Qoritrol for address translation, routing, making efficient use of the switch, orderly transmission 
accumulated packets, and management of buffer memory. 

5. An interface for operation, administration, and maintenance of the overall system. 

7. A control channel to each NIM for operation, administration, and maintenance functions. 



30 



4£ Data Bow 

Irt order to understand She descriptions of the individual functional units that make up a MINT, it is first 

35 necessary to have a basic understanding of the general, flow of data and control. FIG. 10 shows an overall 
view of the MINT. Data enters the i^lNT on a high-speed (100-1S0 Mbit/s) data channel: 3 from each NIM. 
This data is in the form of packets, on the order of 8 Kifobits long, each with its own header containing 
routing information, The hardware allows for packet sizes in increments of 512 bits to a maximum of 12S 
Kilobits. Small packet sizes, however, reduce throughput due to the per-packet processing required. Large 

40 maximum pscket sizes result in wasted memory for transactions of less' tJian a maximum sise packet The 
link terminates on an externa) link handier 16 (XLH), which retains a copy of the pertinent header fields as it 
deposits the entire packet into the buffer memory. This header fnfomnatlon. together with the buffer memory 
address and length, is then passed to the central control 20. The centrai control determines the destination 
NIM from the address and adds this block" to the list of blocks (if any) awaiting transmission to this same 

45 destination. The central control also sends a connection request (o the switch controller if there is not 
already a request outstanding. When the central controi receives an acknowledgement from the switch 
oonfroiler that a connection request has been safisfisd. the central control transmits the list of memory 
blocks to the proper internal link handler 17 (ILH). JhB ILH reads the stored data from memory and 
transmits It at high speed (probably the same speed as the incoming links) to the MAN: switch, which 

50 directs it to Its destination. As the biooks are transmitted, the lt.H informs the central control so that the 
blocks can be added to the list of free biotas available, for use by the XLHs. 



4:3 Memory t^odules 

The buffer memory 13 {FIG. 4) of the MINT 11 satisfies three requirements: 
1. The quantity of memory provides sufficient buffer space to hold the data accumulated {for al! 
destinations) while awaiting switch setups. 

25 




EP 0 33S 5B2 A2 



Z Tbs memory bandwidth is adequate tO' support simultaneous activity on al! eight links (four 
receiving and four transmitting). 

3. The memory access provides for efficient streaming of data to and frorji ttia linlc^ tiandfers. 



4,3.1 Ofganizatlon 

Bscause of the amount of memory required (Msgabytes), it is desirable to employ conventional higir- 
w density dytiarnic random access merriory (DRAIVl) parts. Thus, high Saodwidth can be achieved only by 

making Uie memory wide. The memory is therefore organized into 16 modules 201 202 which maite up a 

composite 5ia-bit word. As will be seen below, memory accesses are organized in a synchronous fashion 
so that no module ever receives successive requests without sufficient time to perform the required cycles. 
Tne range of memory for one MINT 11 In a typicar MAN appiioation is 16-64 Mbytes. The number is 
15 sensitive to the speed of application of flow control in overioad situations. 



4.3.2 Time Slot Assigriers 

20 Ths time siot assigners £03,....204 (TSAs) combine the funcfons of a conventional DRAM controlier and 
a speclaiized 3-channsl DMA controller, Each receives reaci/write requests from logic associated with the 
Data Transport fling 13 (see 14.4, below). Its setup commands come from dedicated control time slots on 
this same ring. 

35 

4.3.2.1 Control 

From a control viewpoint, the TSA appears as a set of ragislers as showi in F16. 11, For each XLH 
there is an associated address register 210 and count register 211,. Each ILH aiso has address 213 and 

so count 214 registers, but .in addition has registers containing ths next address 215 and count 216. thus 
allowing a series of blocks to be read, from memory in a continuous stream with no inter-block gaps, A 
special set of registers 220-228 allows the MlNFs cent-al control section to access any of the internal 
registers in the- TSA or to perform a directed read or write of any particular word in memory. These 
registers include a write data register 220 and read data register 221, a memory address register 222, 

3S channel status register 223, error register 224, memory refrssh row address register 225. ^nd diagnostic 
control register 22S. 



4.3.2.2 Operation 

In normal operation, the, TSA: 203 receives only four order types from the ring interface logic; (1) "write" 
■requests for data received. by an XLH, (2) "read" requests for an ILH, (3) "new address" commands Issuetj 
by either an XLH or an iLH, and (4) "idle cycie" Indications which tell the TSA to perform a refresh cycle or 
other special operation. Each order is accompanied by the identify of the link handler involved and, in the 

« case 0} "write" and "new address" requests, by 32 bits of data. 

For a "writs" operation, the TSA 203 simpiy perlcrms a memory write cycle using the address from the 
register associated with the indicated XLH 16 and ths data provided by the ring interface logic, it then 
Increments the address register and decrements the count register. The count register is used in this case 
only as a safety check since the XLH should provide a new address iSefore overflowing the current block, 

so For a "read" operation, the TSA 203 must first check whether fte channel for this ILH is active, if it is, 
L^ie TSA performs a memorif read cycle using the address from the register for this !LH 17 and presents the 
data to the ring interface logic. It also Increments the address register and decrements the count register, in 
any case, the TSA provides the interface logic with two nag" bits which indicate (1) no data available, (2) 
data available, .(3) first word of packet available, or (4) last word of packet available. For case (4), the TSA 

55 will load the ILH's address .214 and count 213 registers from its "next address" 216 and "next count" 215 
registers, provided: that these registers have been loaded by the. !IH. If they have not, the TSA marks the 
channel "Inactive," 

From the above descriptions, the function of a "new address" operation can be inferred. The TSA 203 
26 



EP 0 335 562 A2 



receives the link identity, a 24-bit address, and ari 8-bit count- For ao XLH ie. jt:Stmply ioads the associated 
registers, in the case of an ILH 17, the TSA must ciieck whether the ch^nel is active, if it is not, then the 
normal address 214 and count 213 registers are loaded arid the channel is marked active, if the channel is 
currentiy active, then ttie "next address" 216 and "next count" SIS registers must be loaded instead of the 
norma! address and count registers. 

tn an alternative embodiment, tha two tag bits are aiso stored in, buffer memory 201 202, Advanta- 
geously, this permits packet sines that are not limited to being a multiple of the overeil width of the memopf 
{512 bits). In addition, the ILH 17 need not provide the actual length of the packet when reading it, thus 
relieving m central control 20 of the need to pass- along this information to the ILH, 



4.4 Data Transport Ring 

It is the job of tha Data Traisport Ring l9 to carry control commands and high-speed data between the 

link handlers 19,17 and Itie memory modules 201 202. The ring provides sufficient bandwidth to allow all 

the links to run simuHaneously, but carefully apportions this, bandwidth so that circuits connecting to the ring 
are never required to transfer data in high.-spaed bursts. Instead, a fixed time slot cycle is empbyed that 
assigns siots to each circuit at welhspaced intervals. The use of this fljted cycle also means that source and 
destination addresses need not be carried oh the ring ItseSf since they can be readily determined at any 
point.by a properly synchronized counter. 



4.4.1 Electrical Description 

The ring is 32 data bits wide and is clocked at 24 MHz. This bandwidth is sufficient to support data 
rates of up to 150 Mbi^s. In addition to the data bits, the rings contains four parity bits, tv?0-,l5g bits, a sync 
bit to Identify the start of a superframe. and a clock signaf. Wthi.n the ring,: single-ended ECL circuitry is 
used for , ait signals except the clock, which is differentia! ECL, The ring interface logic provides connecting 
circuits with TTL-compatible signal levels. 



4.4,2 Tlrna Slot Sequencing Requirements 

In order to meet the above objectives, the, time slot cycle is subiect to a number of constraints; 

1. During each complete cycle there must be a unique time slot for each combination of source and 
destination, 

2. ,Eaoh connecting circuit must see its data timo slots appearing at reasonably regular intervals. 
Specifically, each circuit must have a certain minimum intenral between its; d.ata time slots. 

3. Each link handler must see Its data time slots in numerical order by memory module number. 
(This, is to avoid making the link handler shuffle a 512-bit word,) 

4. Each TSA must have a known interval during which it can perform a refresh cycle or other 
miscellaneous memory operation. 

5. Since the TSAs in the memory modules must examine every control time slot there must also be 
a minimum interval Ijetween control time slots. 



4.4.3 Time Slot Cycle 

Table 1 shows one data frame of a timing cycle which meets these requirements. One data frame 
consists of a total of SO time slots, of which 64 are used for data and the remaining 16 for control. The table 
shows, for each memory module TSA the slot during which it receives data from each XLH to be written 
into memory and during which it must supply data that was read from memory for each ILH. Every fifth slot 
Is a control lime slot during which the indicated link, handier broadcasts cortroi orders to alt the TSAs. For 
the purposes of this table, XLHs and !LHs arff numbered 0-3, and TSAs are numbered 0-15. TSA 0, for 
example, during: time slot 0 recelves data from XLH 0 arid must supply data for ILH 0. During slot 17, TSA 0 
perfomis similar operations for XLH 2 and ILH 2. Slot 46 is used for XLH 1 and ILH 1, and slot 63 Is used 
for XLH 3 and ILH 3- The re-use of the same time slot for reading and writing is permissible since XLHs 



27 



EP 0 335 562 A2 

never read from memory snd ILHs never write, thus effectively doubiing \h& data bandwidth of the ring. 

Ttie control time slots are assigned, in sequence, to tiiS: four XLHs, the four iLHs. and iha central 
control {CO). With ibese nine entitles sharing the control time siots, the control frame is 45 time siots long. 
The 80-slot data frame and the 45-slat control frame corns into alignmsnf every 720 time slots, This period 
is ttie superframs and Is marked by the superframe sync signal. 

There is a subtle synchronisation condition that must also bo met for the iLHs. The words of a block 
must be sent in sequence begir^ning with word 0. regardless of wtiere in the ring, timing cycle the order was 
received. To: assist In meeting Shis requirement, ths ring interface circuitry provides a special "word 0" sync 
signal for each ILH. For example, .in the timing cycis of Table I a n^w address might be sent by tLH 0 
during WmB slot 34 {its control time slot). It is necessary to ensure that TgA .numbef 0 i$ the first TSA to act 
on this new address {requirement 3 in section 4,4.2) even though the data time slots for reads from TSAs 
numbered 5 through 1,5 for ilH 0 immediately follow time slot 24. 

Since ttie number of time slots in the superframe 720. exceeds the number of elements on the ring, 25, 
it is apparent that the iogicai time siots do not have a permanent existence; each Sime slot is, in effect, 
created at a particular physical location on the ring and propagates around the ring until it returns to this 
location, where it vanishes. The effective creation point is different for data time slots than for control lime 



40 



& 0 33S 562 A2 



TABLE I 

RING TME SLOT ASSlGTvMENT 

Write to From Read from To Conffo] 
Time Slot TSA XLH TSA ELH Slot Source 



00 


0 


0 


0 


0 




01 


7 


I ' 


7 


I 




02 


13 


2 


13 


2 




03 


4 


3 


4 


3 




04 










XLHO 


05 


I 


0 


1 


0 




06 


8 


1 


8 


1 




07 


14 


2 


14 


2 




08 


5 


3 


5 


3 




09 










XLHl 


10 


2 


0 


2 


0 




11 


9 


1 


9 


1 




12 


IS 


2 


15 


2 




13 


6 


3 


6 


3 




14 










XLKL 


15 


3 


0 


3 


0 




16 


10 


1 


10 


1 




17 


0 


2 


0 


2 




18 


7 


3 


7 


3 




19 










XLH3 


20 


4 


0 


4 


0 




21 


11 


1 


11 


1 




22 


1 


2 


1 


2 




23 


8 


3 


8 


3 




24 










ILHO 


25 


5 


0 


5 


0 




26 


12 


1 


12 


1 




27 


2 


2 


2 


2 




28 


9 


3 


9 


3 





29 



I 



EP 0 33S 562 A2 





29 












30 


6 


0 


6 


0 


s 


31 


13 


1 


13 


I 




32 


3 


2, 


3 


2 




33 


10 " 


3 


10 


3 




34 










ro 


35 


7 


0 


7 


0 




36 


14 


1 


14 


1 




37 


4 


2 


4 


2 


JS 


38 


a 


3 


n 


3 




39 












40 


8 


0 


8 


0 




41 


15 


1 


15 


1 


zo 














42 


5 


2 


5 


2 




43 


12 


3 


12 


3 




44 










25 


45 


9 


0 


9 


0 




46 


0 


1 


0 


1 




47 


6 


2 


6 


2 


30 


48 


13 


3 


13 


3 




49 












SO 


10 


0 


10 


0 


3S 


51 


1 


1 


1 


I 




52 


7 


2 


7 


2 




53 


14 


3 


14 


3 




54 












55 


11 


0 




0 




56 


2 


1 


2 


1 




57 


8 


2 


8 


2 


■SS 


58 


15 


3 


15 


3 




59 












60 


12 


0 


12 


0 




61 


3 


1 


3 


1 


SO 


62 


9 


2 


9 


2 



BP 0 335 562 A2 



63 


0 




Q 


3 


64 










63 


13 


0 


13 


0 


66 


4 


1 


4 


1 


67 


10 ' 


2 


10 


2 


68 


1 


3 


I 


3 


69 
70 


14 


0 


14 


0 


71 


5 


1 


5 


1 


72 


11 


2 


11 


2 


73 


2 


3 


2' 


3 


74 










75 


15 


0 


15 


0 


76 


6 


1 


6 


1 


77 


12 


2 


12 


2 


78 


3 


3 


3 


3 


79 











30 4.4.3.1 Da|a Tims Slotg 

Data time slot$ (an tje considered to originate at the owning XLH. A data time slot is used to carry 
incoming data to Its assigned memory module, at wtiich point it Is re-used to csrry outgoing data (o tlie 
corresponding iLH. Since XLHs never receive information from a data me slot, the ring car! be considered 
3S to be iosically broken (for data time slots only) between ttie ILHs; and the XLHs. 

Tfie two tag bits identify ttie contents Of ttie data time slots as follows: 
■11 Empty 

10 Data 

01 First word of paclcet 
40 DO Last word of packet 

Ttie "first word of packet" is sent only by memory module 0 when it sends the flrst word of a packet to an 
!LH. The "last word of packet" indication is sent only by meniory module 15 wh&n it sends tba end oi a 
packet to an ILH. 

4.4.12 Control Tims Slots 

Control mes slots originate and terminate at the slalion of central control 20 on the ring, The link 
handlers use their assigned control slots only to broadcast orders to the TSAs. The CC is assigned every 
so ninth control time slot. The TSAs receive orders from ali control tinie slots and send responses back to the 
CC on the CC controi time, slot. 

The two tag bits identify the contents of a control time slot as follows: 

11 Empty 

10 Data (to or from CC) 
55 01 Order 

.00 Address & count (from a link handler) 



31 




EP 0 335 562 A2 



4.5 External link Handler 

The prificlpai; function of the XLH Is to terminate the incoming liigh-speed data channel from a NIM, 
deposit the data in the MSNTs buffer memory, and p$s$ the necessary information to tlie MINT'S cential 

s control 20 so that the data can ba forwarded So its destination, in, addition, the XLH terminates an incoming 
iow-speed control channel that is m.ij|tiple)ced on tns fiber link. Some of ti>e functions assigned to the low- 
speed control channel are the transmission of the NIM status and control of flow in the network. It should ijo 
noted that the XLH Is only terminating the incoming fiber from the NIM, Transmission to fhe NIM is handled 
by the internal link ftandler and the phase alignment and scrambler circuit that will be described later. The 

to XLH uses an onboard processor 263 to interface to the hardware of the MINT central control 20. The four 
20 Mbit/sec links coming from this processor provide the connectivity to the central control section of *e 
!i4INT. FIG. 12 shows an overall view of the XLH. 



T5 4.5.1. Link Interface 

The XLH contains the fiber optic receiver, clock recovery circuit and descrambier circuit needed to 
recover data from the fiber. After the data clock is recovered (block 250) and Ihe data descrambled (block 
252) the data is then converted from serial to parallel', and demultiplexed (block 254) into the high-speed 
zo data channel and ttie Sow-speed data channel. Low level protocol processing is then performed on the data 
on the high-speed data channs! {block 25S) as described in 65. Tfiis results in a data stream consisting of 
only packet data. The stream of packet data then goes through a first-tn-first-out (FIFO) queue 258 to a data 
steering circuit 260 which steers the header into the header FIFO 266 and sends the complete packet to the 
XLH's ring interface 292. 



jM2 rang Interface 

The ring interface 262 logic controls transfer of data from the packet FIFO 258 in the link interface to 
30 the MINT'S buffer mertiory. It provides the .following functions: 

1:,: EsSE45lishing and maintaining synchronization with the ring's timing cycle. 

2. Transfer of data from the link interface FIFO to the proper ring time slots, 

3. Sending a new address to the memory TSAs when the end of a packet is encountered. 

It should be noted that resynchronl^atlon with the ring's ie-word {per XLH) timing .cycle will have to be 
35 performed during the processing of a packet whenever the iink interface FIFO becomes temporarily empty. 
This will be: a normal occurrence since the ring's bendwidih is higher than the link's transmission rate. The 
ring and TSA, however, are designed to accommodate gaps iri the data stream. Thus, resynchronization 
consists simply of waiting for .data to become available and for the ring cycle to return to She proper word 
number, marklrig: the^ intervening time slots "empty." For example, !f the FIFO 256 becomes empty when a 
40 word destined for Sh& fifth memory module is needed, it is necessary to ensure that the next word actually 
sent goes to that memory module, in order to preserve the overall sequence. 



Tlie control portion of She XLH is reSiJonsible for replenishing Bis free block: FIFO 270 and passing the 
header information about each packet received to the MINT'S central control 20 (FIG. 4). 



50 4.5.3.-! Header Processing 

At the same time a packet is being transmitted on the ring, the header of the packet is deposited in the 
header F1FQ;2.66 that is subsequently read by the XLH processor 268. in this header are the source and 
destination address fields, which ibe central control will require for routing. !n addition. She header checksum 
55 is verified to ensure that tfiese. fields .have not baen corrupted. The header inforrTtaSton is then packaged 
with a memory block descriptor (address and length) and sent in a message to the central control 20 (FIG. 



25 



4.6.3 Control 



32 



EP 0 S35 562 A2 



4.5.3.2 Interaction, with Qgntra! Controj 



There are oniy two basic interactions with the MINTS centrai control. Tha XLH control attempts to keiep 
its free-block FIFO 270 full with block addresses obtained from the memory manager, and \i passes header 
information and memory block descriptors to the central controf so that the block can be routed to its 
destination. The block addresses are subsequently placed on the ring 19 by ring interface 262 upon receipt 
of the address from control sequencer 272. Both Interactions with tiie central control are carried out over 
links, from XLH processor 268 to the appropriate sections of the central control, 



4.6 fotern^ Link H andler 

The Internal link handler (ILH) {FIG. 13) is the first part of what can be considerec! a distributed link 
controller, At any instant in lime this distributed link controller consists of a particular ILH; a path through 
the switch fabric and a particular Phase Alignment and Scrambiar circuit 290 (PASC). The PASC is 
described in section 6,1, It is the PASC' that is actually responsible for the transmission of optical signals 
over the return fiber of fiber pair 3 to the NlM from the MINT. The information that is transmitted over the 
fiber coriies from tha MANS 10. which receives inputs at different times from the ILHs sending: to that NSM. 
This kind of distributed link controller is necessary since path lengths through the MAt^l switch fabric are not 
all ecjuai. If the PASC did not align all of the information coming from different ILHs to the same reference 
clock, infomiatlon received by the N!M would be continually changing its phase and bit alignment. 

The combination of the ILH with the PASC is in many ways a mirror image of the XLH. The ILH 
receives .lists of , block descriptors from the central control, reads these blocks from memory, and transmits 
the data over ie serial link to the switch. As data is received from' memory, the asociatsd block; descriptor 
is sent to the central control's memory manager so that the. block can be returned to the free list, 

Th0 ILH differs from the XLH in .that the ILH performs no special header processing, and the TSAs 
provide the ILH with additional pipelining so that multiple blocks can be transmitted as a continuous stream 
if desired. 



4,6.r Unk Interface 

The link interface 289 provides the seriate transmitter for the data channel. Data is transmitted in a 
frame-synchronous fomiat compatible witti the link data format .descrltjed; in §5. Since the data is received 
from the ring interface 2B0 {see below) asynchronously and at a rate somewhat higher that the link's 
average data rate, the link interface contains a FIFO aSEto provide speed rnatching and frame synchronisa- 
tion. The data is received from MiNT memory via data ring iriterfaca 260, stored in FIFO 282, fs processed 
by levei t and 2 protocol, handler 286, and is transmitted to MAN switch 10 through the parallel to serial 
converter 288 within link interface 2S9. 



4,6,2- Ring interface 

The ring interface 260 .logic cpntrots the transfer of data from the MINT'S buffer memory io the FIFO in 
the link interface. It- provides the follo.vving functions: 

1. Establishing and maintaining synchronization with the ririg's timing cycle. 

2. Transfer of data from the ring 1b the linktnteriace FIFO during the proper ring time Slots. 

3. - Notifying the control section when the last word of a packet (memory block) Is received. 

4. Sending a new address and count {if available) to the memory TSAs 203 204 {PIS. 10) when the 

last word of a packet is received and the condition of the FIFO 282 is such that the new packet will not 
cause an overflow. 

Unlike the XLH, tiie ILH relies on the TSAs to ensure that data words are received in sequence and with no 
gaps within a block. Thus, maintaining word synchronizaficin in this case consists simply of looking for 
unexpected empty data time slots. 



4,6.3 Controi 



33 



SP 0 33B 562 A2 



The conlrol portion of the ItH, cOntrolSed by sequencer 383 is responsible for providing the ring 
interface with' block descriptors received via tiie processor linl< interface 284 from tlie central control and 
stored therefrom --in address FIFO 285, rotlfying the csntral conlrol. via tlie processor Sink interface when 
btocl<s have bseri retrieved from memory, and notifying the central control 20 when transmission of the final 
5 block is complete. 



4.6.3.1;. Interaction vyith^ 

JO There are only three basic interactions with the MINT'S cenfrai. control: 

1. Rsceiving lists of block descriptors. 

2. informing the memory manager of blocl<s that have been retrieved from memory. 

3. Ihforming the switch request queue manager when all bloc!<s have been transmitted. 

In ttie present design, ait of these interactions are carried out over Transputer lintts to the appropriate 
J5 sectioris of the central contrct. 



4j_.3.g. Intsractiori with TSAs 

Like the XLH, the !LH uses its control time slots to send block descriptors {address and lerigihs) to the 
TSAs. When the TSAs receive a descriptor from an ILH, however, they will immediately begin reading the 
blcclc' from memory and placing the data on the ring. The length fieid from an ILH is, significant and 
determines the number of words that will be read by each ISA before moving on to the next i>)ook. The 
TSAs also provide each ILH with registers to hold the next address and length, so that successive blocks 
can be transmitted without gaps. Flow controi is the responsibiiity ot the ILH, however, and a new descriptor 
shoulcf not be sent to the TSAs until there is enough room In the packet FSFO 282 to compensate for 
reframing time and ihe difference in transmission rates. 



30 H SSDlill 

FIO. 14 is a block diagram of IVIINT central control 20; This central coritrot is connected to the four XLH 
les of the MINT, the four !LH 17s of She MINT, to data concentrator 1^6 and distributor 138 of the switch 
control {See PIG. 7), and to an OA&M central control 352 shown In FIG. 15. The relationship of the centra! 

3S control 20 with other units will first be discussed. 

The MINT central control communicates with XLH 16 to provide memory biocl? addressed for use by 
the XLH trt order to store incoming data in the MINT memory. XLH 1 6- communicates with the MINT central 
comroi to provide the header of a .packet to be stored in MINT memory, and tt» address where that packet 
is to be stored. Memory manager 302 of MINT central control 20 communicates with ILH 17 to receive 

40 information that memory has' been, released by an !LH because the message stored in those memory 
blocks has been delivered, so that the released memory can be reused. 

V*en queue, manager 311 recognizes that the first network unit arriving for a particular NliVl has been 
queued in switch unit queue 314. which contains FIFO queues 316 for each possible destination NIM, 
queue manager a^i' sends a request to switch setup control 313 to request a connection in MAN switch 10 

4S to that UM. The request is stared in one of the queues 313 {priority) and 3lS (regular) of switch setup 
control 313. Switch setup control 313 administered these requests according to their priority and sends 
requests to MAN switch 10, specifically to switch control data concentrator 136, For norma! loads, the 
queues 318 and 312 should be almost empty since requests can normally be made almost immediately 
and will generally be processed by the appropriate MAN switch controller. For overload conditions, the 

50 queues 318 and 312 become a means for defemng transmission of lower priority packets white retaining 
the relatively fast transmission of priority packets, if experience so dictates. It may be desirable to move a 
request from the regular queue to the priority queue if a priority packet for that dsstina)lon NIM is received. 
Requests queued in queues 316 and 312 do not tie up an It, and IIH, and an output link of circuit switch 
1 0; this is in contrast to requests in the queues 160,152 {FIG. 3) of an MAN switch controller 140 (FIG. 7). 

ss V/hen switch setup control 313 recognizes, that a connection has been estatjiished in switch 10, it 
notifies NIM queue manager Sl1, The ILH 17 receives data from a FIFO queue 316 in switch unit queue 
314 from NIM queue manager 311 to identify a queue of the memory locations of data packets which may 
be transmitted to the circuit switch, and for each packet, a list of one or more ports on the NIM to which 



34 




EP 0 335 562 A2 



that packet is to bs transmittad. NIM queue manager 31 1 then causes ILH 17 to prefix Uie port number(s) to 
eacft packet :Sn(j to transmit data for each packet from memory t8 to switch 10. Tlie;:lLH then pfoceecis to 
transmit the packets ol the queue and v/hm it has completed this taslt, notifies the switch setup control 313 
that the connscfion in the circuit switch rr)ay be disconnected and notifies memory manager 302 of the 

3 identity of the blocks cf memory that can riow be released because the data has been transmitted. 

Tfis MINT central, control use? a plurality of high speed processors each of which have one or mors 
inpufoutput ports. The specific processor used in this implementation is the Trsnsputsr manufactured by 
INMOS Corporation. This processor has four input/outoul ports. Such a processor can meet the processing 
demands of the MINT central control. 

ffl Packets come into ths four ><LHs 16. There are four XLH managers 305, source checkers 307, routers 
309, and OA&M MINT processors. 315, one corresponding to each XLH within the MINT; Siese processors, 
operating in parallef to process the data entering each XLH increase the total data processing capacity of 
the MINT centra! control. 

Th& header for each packet entering m XLH is transmitted along with the address where that packet is 

IS being stored directly to an associated XLH manager 30S, if the header has passed the hardware check of 
the cyclic redundancy code (CRC) of the header performed t>y the XLH, if that CRC checii fails, the packet 
is discarded by the XLH which recycles the allocated memory block. The XLH manager passes the .header 
and the identity of allocated memory for the packet to the source checker 307. The XLH manager recycles 
memory , blocks if any of the source checker, router, or NIM queue manager find it impossible to transmit 

ao the packet to a destination. Recycled memory blocks get used before memory blocks alldcatsd by the 
memory manager. Source checker 307 checks whether the source of the packet Is properly logged in and 
whether that source has access to the virtual network of the packet Source checker 307 passes information 
about the packet, including the packet address in !v!INT memory, to router 309 which translates the packet 
group idsntiftcation, effecfiveSy a virtual network name, and the destination name of the packet in, order to 

25 find out which oirtput link this packet should be sent on. Router 309 passes the tdenttficatipn of the output 
link to HiM queue manager 3i1 which identifies and chains packets received by the fpur XLHs of this M!NT 
which are headed for a common output link. .After the first packet to a. NIM queue has been received, the 
N!M queue manager 311 sends a switch setup request to switch setup control 313 to request a connection 
to that NIM, NIM queus manager 31 1 chains these packets in FIFO queues 3t6 of switch .unit queue 31,4 so 

so that, when a switch connectton is made in the circuit switch 10, all of these packets may be sent over that 
connection at one time, Output control signal' distributor 138 of the switch control 22 replies with an 
acknowiedgment'when it has set up a connection. This acknowiedgmsnt is received by switch setup control 
313 which informs NfM queue manager 311. NIM queue manager 311 then Informs ilH 17 of the list of 
chained packets in order that !LH 17 may transmit ail of these packets. When ILH 17 has completed the 

as transmfs-^ion of this set of chained packets over the circuit switch, it informs switch setup control 313 to 
request a disconnect of the connection in switch 10, and informs memory manager 301 that the memory 
which was used for storing the data of the message is now available for use for a new message. Mem.ory 
manager 301 sends thiS; release infomistion to memory distributor 303 which distributes memory to the 
various XLH managers 305 for allocating memory to the XLHs, 

40 Source checker 307 also passes billing infornnatton to operation, administration and maintenance 
(OAaM) MINT processor 31 5 in ordsf to perform billing for that packet and to accumulate appropriate 
statistics for checlcing on the data flow within the MINT and, after combination with other statistics, iri the 
MAN network. Router 303 also informs (OA&M) MINT processor 315 of the destination of ths. packet so that 
the OAdM MINT processor cari keep track of data concerning packet destinations for subsequent traffic 

4s anaiysts. The ou.tput of the four OA&M MINT processors 315 are sent to MINT OA&M monitor 317 which 
summarizes the data coHected by the four OA&M MINT processors for subsequent transmission to OA&M 
cental control 352 (RQ. 14). 

MINT OA&M monitor 317 also receives Information from OA&M central control 352 for making changes 
via OA&M MII^T processor 315 in the router 309 data; these changes reflect additional terminals added to 

50- the network, ft© movement of logical terminals (i.e., terminals associated with a particular user) from one 
physical port to another, or the removal of physical terminals from tha network. Data is also provided from 
the OA^M central controi 352 via the MINT operation, OA&M monitor and the OA&M MINT processor 315 
to source checker 307 for such data as a iogicai user's password and physical port as well as data 
concerning the privileges of each logical user. 



4,8 MINT Operation, Administration, and Maintenance Control Systm 



3S 



EP 0 335 562 A2 



F!G, 15 is a btack diagram of the maintenance and control system of the MAN netwafk. Operation, 
administration, and maintenance (OA&IVi) system 350 Is connected to a plurality of OA&M central controls 
352, Tliese OA&M controls are each connected to a plurality of MINTs; and within each MINT, to the MINT 
OA&M monitor 317 of MINT central control 20, Since many of the messages from OA&M system 350 must 
5 be distributed to all the MlNTs, th@ various OA&M central controls' are Interconnected by a data ring, This 
data ring transmits such data as the idenfiftcatiori of the networl< interface module, hence the Identlfjcation 
of the output link, of each physical port that is added to the network so that this iriformation may be stored 
in the router |3roc6ssors 309 of every MINT in the MAN hub. 

10 

5 UNKS 



5.1 Link Requirem ents 

75 ~ 

The links in the MAN system are used to transmit packets' between the EUS and the NIM (EUSL) (links 
14) and between the NIM and the MAN hub (XI) (links 3), Although the operation and the characteristics of 
She the daia that is transferred on these links varies slightly with the particulaf application., the format used 
on the links is the same. Having the forrriats be the same makes it possible use common hardware and 
.20 software. 

The link format is desigried to provide the following features. 

1. U provides a high data rate packet channel, 

2. It is compatible with tfie proposed Metrobus "OS-1 " format. 

3. Interfacing is easier because of the word oriented synchronous format 
25 4. It defines, how "packets" are delimited, 

5. it includes a CflC for an entire "packet" (and another for the header.) 
6; The format insures transparency of the data within a "packet". 

7. The format provides a low bandwidth^ channel for flow control signaling. 

8. Additional Jow bandwidth channels can be added easily. 

30 9. Data scrambling insures good transition density for clock recovery. 



5.2 MAN Link Description and Reasoning 

3S 

From a perfornnance point of view, the faster the links are the better MAN wil! perform. This desire to 
operate the links as fast as possible is tempered by the fact that faster links cost more. A reasonable 
tradeoff between speed and cost is to use LED transmitters (like the AT&T ODL-200) and multimode fiber. 
The use of ODL-200 transmitters and receivers puts an upper limit on the link speed, of about. ?OOMbit/sec. 

'TO From the MAN architecture point of view, the exact data rate of the links is not important since MAN doss 
not do synchronous switching. The data rate for the MAN links was chosen to be the same as the data rate ■ 
of the Metrobus Lighlwgve System "OS-1 "„ The Melrobus format is described in M. S, Schaefer: 
"Synchronous Optical Transmission Network for the Metrobus Lightwave Network",-. IEEE International 
Communications Cgnferenpe , June 1887. Paper 30B.1.1, Another data rata {and format) that could be used 

4S in MAN wiil come from the specification of SONET, a link layer protocol specified by Bell Communications 
Research Corp. for 150 Mbit/sec unchannelized links. 



5.2.1 Level 1 Link Format 

50 

The MAN network uses the low Igvet link format of Metrobus, Iriformation on the link is carried by a 
simple frame that is continuojsly repeated. The frame consists of 88 * 16 -bit- words. The first word contains 
a. framing seQuenco and 4 parity bits. In addition to this first word, three other words are overhead words. 
These overhead words, which are used for internode communications in the f\fletrobus implenientation, are 
55 not used by MAN for the sake of Metrobus compatibility. The word oriented nature of the protocol makes 
using it rfiuch simpler. A simple 16 bit shift register with parallel load can be usad to transmit and a similar 
shift register witti parailel read out can be used to receiva. At the 1 4e.433Mbit/sec. link data rate, a 16 bit 
word is transmitted or received every 109ns. This approach makes it .possibie to implement much of the link 



36 




EP 0 335 562 A2 



formatting hardware at conventtonai TTL clock rates. The word oriented nature of the proiocol d06s put 
soma restrictions on the way ths iink is used, however. To l<eep the compiexity of the hardware reasonalsie 
it is necessary to usb the bandwidth of the linic in. units oi 18 bit words. 



5.2.2 Level S Link Fomriat 

Tho iink is used to move "packets", the basic unit of information Iransfer in MAN. To identify packets, 
me format includes the specification .of "SYNC words and an "IDLE" word. When no packets are being 
10 transmitted the "IDLE" word will fill ait of the words that make up the primary channo! bandwidth {words not 
reserved for other purposes). Packets are deitmtted by a leading START^SYNC and a trailing END_SYNC 
word. This sohenie works weit as long as the words with spsclat meanings are never contained in the data 
within a packet. Since restricting the data that can be sent in a packet is an unroasonaye restriction, a 
■transparent data transfer technique must be used. MAN links employ a very simple word stuffing 
15 transparency techtiique. Within the packet data, any occurrence of a special meaning word, like the 
START_SYNC word, is preceded by another special word the "OLE" word. This word stuffing transparency 
was chosen because of the simplicity of impiementation, This pratocol requires simpler, lower speed logic 
than Is required for bit stuffing protocols like i^DLC. The technique itself is simiiar to the tfrne proven 
techniques used in IBM's 5tSYNC links. !n addition to the word stuffing used lo ensure transparency, "FILL" 
so words are inserted if the data rats of the source is slightly iess than the link data rate, 

The last word in any packet is a cyclic redundancy check {CRC) word. This word is used to insure the 
that any corruption .of the data in a packet can be detected. The CRC word is computed on ail of the data in 
the packet, excluding any special words like "DLE" that may need to be inserted in the data stream for 
transparency or oth&r reasons. The polynomial that is used to compute the CRC word is the CRC-16 



To ensure good transition density for the optical receivers ali of the data is scrambled (e.g., block 296, 
FIG. 13) prior to transmission. The scrambling makes it less likely that long sequences of ones or zeros wilt 
be transmitted on the link even though they may be quite common in the data actually being transmitted. 
The scrambler, and descrambler (e.g.. block 252, FiQ. 12) are well known in the art. The descrambler design 
30 is self synchronizing, which makes it possible to recover from occasions) bit errors without having to restart 
the descrambler' 

5,2,3. Low Speed Channels and Row Control 

35 "~ 

Not all of the paytoad words in the level 1 format are used for the level 2 format, that carries packets. 
Additional channels are Included on the link by dedicating paftl.cuiar words within the frame. These low rate 
channels 355,295 {FIGS. 12 and 13) are used for MAN network control purposes. A packet delimiting 
scheme simitar to Biat used on the primary data channel is used on these !ow rate channels. The dedicated 

« words that make up low rate channels can be further divided down into individual bits for very !ow 
bandwidth channels like the flow control channel.. The flow control channel is used on the MAN EUSl 
(between the 6US and the NIM) to provide hardware level flow control. The flow control channel (bit) .from 
ttie;>8tM:^So the EUS, indicates- to the EUS link transmitter whether or not it is allowed to transmit more 
information. The design of the NIM Is such that sufficient storage is available So absorb any data that is 

46 transmitted prior 1o the EUS transmitter actually stopping after flow control is asserted. Data transmission 
can be stopped either between packets or in the middle of a packet transmission. If it is between packets, 
the next packet will not be sent until flow control Is tumed deasserted. If flow control is asserted in the 
middle of a packet, it is necessary to suspend data transmission immediately and start sending the "Special 
FILL' code word. This code word, like all others, is escaped with the "DLE" code word when It appears in 

so the body of a packet. 



6 SYSTEM CLOCKING 

The MAN switch, as described in section 3, is an asynchronous space switch fabric with a. very fast 
setup controller. The data fabric of the switch is design to reliably propagate digital signals with data rates 
from DC to In excess of -aOOMbits/second. Since many paths can simultaneously exist through the fabric, 
Slie aggregate bandwidth requirements of the MAN hub can be easily meet iSy the fabric. This simple data 



25 standard- 



37 



EP 0 33S 562 A2 



fabric is not witbout drawbacks hawevsr. Because of mscHanica! and electrical constraints in Implementing 
She fsbric, it is not possibis ior ail paths through the switch to incur the same amounf of delay. Because the 
variations in path delay between different patlis ms.^ be much greater ttian the bit time of tlie data going 
ihrough the switch, it is not ppssible to do syrichronous switching. Any time that a path is setLsp from a 
5 particular ILH in a ivllMT to an output port of the switch, there is no guarantee that data transmitted over that 
path v;i!l have the same relative phase as the data transmitted over a previous path through the switch. To 
use this high bandwidth switch it is- therefore necessary to very quickly synchronize data coming out of a 
switch port to the clock being used for the synchronous link to Uie NIM. 

JO 

6.1 The Phase Alignment and Scrambier Circuit (PASC) ■ 

The unit that must do the synchronizatioD of data coming irom the switch and drive the outgoing iinkto 
the NIM called the Phase Alignment and Scrambler Circuit (PASC) {blocfc S90, Fi(3. 13). Since the ILHs and 

75 the PASC circuits are all part of the MAN hub, if is possible to distribute She same master ctock to all of 
them. This has several advantages. By using the same clock reference in the PASC as is used to transmit 
data from ths !LH, c«e can be sure that data can not be coming into the PASC any faster than it is ^^ing 
moved out of it over the link. This eliminates the need for large FIFOs and elaborate elastic store controllers 
in the PASC. The fact that the bit rate of al! data that comes Into a PASC is exactly the the same makes, the 

20 synchronization easier. 

The !LH and the PASC can be thought of as a distrltxjted link handler for the format described in the 
previous section. Ttie ILH creates the basic framing pattern into which the data Is inserted and transmits it 
through the fabric to a PASC. The PASC aligns this framing pattern with its own framing pattern, merges in 
the !o¥/ speed control channel and then scrambles the data for transmission. 

25 The PASC synchronizes the Incoming data to the reference clock by inserting an appropriate amount of 
delay into the data path. For this to woiit the tLH must be transmitting each frame with a reference clock 
that is slightly advanced from the reference clock used by the PASC, The number of bit times of advance 
that the ILH requires ts cietfermined by the actual minimum delay that may be Incurred in getting from the 
ILH to the PASC. The amount of delay that the PASC must be capable of inserting Into the data path is 

,30 ■ dependent. on ;the possible variation in path delays thai may occur for different paths through the switch. 

FIG. 23 is a Hock diagram of an illustrative emiwdimenf of the invention. Unaligned data enters a 
lapped delay line 1001. The various taps of tlie cSefay line are clocked into edge sampling latches 

1O03 1005 by a signal that is 180 degrees put of phase with the rererence clock (REFCLK) and is 

designated REFCLK , The outputs of the edge sampling latches feed selection logic unit 1007 whose output 

3S is used to control a selector 1013 described below. Selection logic 1007 includes a set. of internal latches 
for repeating the state of latches 1003..,.,1005, The selection logic includes a priority circuit coi>nected to 
these interna! latches, for selecting the highest rank order input which carries a logical "one". The output is 
a coded identification of this selected input. The selection logic 1007 has two gating signals: a dear signal 
and a signal from all. of a gmup of internal latches of the selection logic. Between data streams, the clear 

to signal goes to a zero state causing the Internal latches to accept new inputs. After the first 'one" input has 
been received from the edge sampling latches 1OO3,-..,iO05 in response to the first pulse of a data stream, 
tfis state of the transparent latches is maintained until tha clear signal goes back to the zero state. The clear 
sigrial is set by out of band circuitry which recogniEes the presence of a data stream. 

The output of the tapped delay iine also goes to a series of data latches 1009,...,1011,,The input to the 

■is data latches is docked by the reference .clock. The outputs of the data latches 1009,...,t011 are the inputs 
to selector circuit lOlS which selects the output of one of these data latches based on the input Urom 
seiectlon logic 1007 , and connects this output to the output of the selector 1013. which is the bit aligned 
data stream as labeled on FIQ. 23. 

After the bits have been aligned, they are fed into a shift register (not shown) with tapped outputs to 

so feed the driver XLS. This Is to allow data streams to be transmitted synotsronoysly starting at sixteen bit 
boundaries. The operation of the shift register and auxiliary circuitry is substantially the same as that of the 
tapped delay line arrangement. 

The selection logic is implemented in commerciaity available priority selection circuits. The seiector is 
simply a orie out of eight seiector controlled by the output of the selection logic, If It is necessary to have a 

ss finer alignment circuit using a one of sixteen selection, this can be readily tmplemsnted using the same 
principfes. The arrangement described herein appears to be especially attractive in situations where there Is 
a cornrrion sotsrce clock and where the length of each data stream is limited. The common source clock is 
required since the clock is not derived from the incoming signal, but is. In fact, used to gate an incoming 

38 



EP 0 335 562 A.2 



signal appropriateiy. The Simitation on the length of the btock is requtrsd since a particular gating selection 
is rnaintained'for the entire btock so that if the btock length were too long. ar^y substantia! amOLint of phase 
wandering would iiause synchronism to be lost and bits to be dropped. 

While in the preswtt Bmbodiment, the signal Is passed through a tapped delay line and is. sampled by 
ttie clock arid inverse clock. «he alternative an-angement of passing the clock ttnrougii a lapped delay line 
and using 'the delayed clocks to sample the signal coutd also be used in some applicaHons. 



6.2 Clock Disff Ipution 

The MAN hub operation is very dependent on the use of a single masiter refer^ince cfock for ail of the 
ILH and PASC units in the system. The master clock must: be distributed accurateiy and reliably to ail of tiie 
yniss. In addition to the basic clock frequency that must be distributed, the frame start pulse must be 
distributed to the PASC and an advanced frame start puise must be distributed to the ILH, All of these 
functions are handled by using a single clock distribution link (fiber or twisted pair) going to each unit. 

The Information that is carried on those clock distribution iinks corriss from a single dock source. This 
information can be split In, the electrical and/or Opflcai domain and transmitted to as many destinations as 
necessary. There is no attempt to keep the infomiation on all of the clock distribution iinks exactly in phase 
since the ILH and PASO are capable of correcting for phase differences no matter what the reason for this 
difference. The information that is transmitted is simply alternating ones and zeros with two excepsions. The 
occurrence of two ones in a row Indicates an advanced frame pulse and tfie: occurrence of two zeroes In a 
row indicates a normal frame pulse. Each board that terminates one of these clock distribution links 
contains a dock recovery moduta The ctock recovery module is the same as that used for the links 
themselves. The clock: recovery module wi'i provide a very stable bit clock while additional logic extracts 
tiie appropriate Jrame or advanced, frame from the data itself. Since the clock recovery modules will 
continue to oscillate at the correct frequency even without bit transitions for several bit iimes, even the 
unlikely occurrence of a bit error wll! not affect the ciock frequency, the logic that looks for the frame or 
advanced frame signal can also be mads tolerant of errors since it is known that the frame pulses are 
periodic and extraneous puises caused by bit errors can be igtiored. 



7 NETWORK INTgRFACE I^ODULE 



7,1 Overview 

The network interface module (NtM) connects one or more end user system linKs (EtJSL) to one l^/IAN 
external link (XL). In so doing, the Nllyl performs concentration and demultipisxing of network transaction 
units (Le. packets and SUWUs), as well as insuring source identification integrity by affixing a physical 
"source port number" to each outgoing packet, Ttis latter function, in- cornijinatioo with the network 
registration service described In §2,4, prevents a user From masquerading as another for the purpose of 
gaining access to unauthorized network-provided servicss. The NIM thereby represents the.boundary of the 
MAN network proper; NlMs are owned by the network provider, while UIMs (described in S8) are owned by 
ttte users tliemselves. 

This section describes the basic functions of the NiM in more detail, and presents tfie NIM architecture. 



7-2 Basic Functions 

The NIM must perform the following basic functions: 
EUS Link interfacing, One ; or more interfaces must be provided to HUS link(s) (see i 2.2.5). Tbe 
dovmstreimlSnk f e^'from NIM to U\M) consists of a data channel and an out-of-band channel used by tJie 
NIM to flow control the upstream link when .NIM input buffers become full. Because the downstream link is 
not flow controlled, the flow controt .channel on the upstream link is unused. The Data and Header Check 
Sequences (DCS, HGS) are generated by the UIM on ttie upstream link, and checked by the UIM on the 
downstream link, 

e)d:ernal Unit interfacing. . The XL (§ 2,2.6) is y.efy similar to the EUSL, but lacks DCS checking and 
generation on both ends. This is to allow erroneous, but still potentlaJly. useful data to be deiivered to the 



p 



IP0 33S.552 A2 



UlWi. The destination port numtjers in network transaction units arriving on the downstream XL are checlced 
by tlie NIM, with ijlegal: vaiues resulting in dropped data, 

Conceniratlon and demultiplexing. Network transaction units arriving pn the gUSU- contend for and are 
stasisticaliy muipfexsd to the outgoing XL Those arrl.virjg Qti the XL are routed to the appropriate EUSL by 
s mapping the destination port number to one or more EUS tinlts. 

Source port identiitcation. The port number of the source Utivt is prepsnded to each network transaction unit 
going upstFearri by port number generator 403 (FiQ. 1 B), Tliis port number wili be ohecited against the MAN 
address by the MHT to prevent unauthoriaed access to services (including the most basic data transport 
service) by "imposters",. 

10 

^-^ ^"^ ■Architecture and Operation 

The architecture of the N!M i? depicted in FtG. 18. The foifowirsg subsections briefly describe the 
rs operation of tfie NIW. 



7.3.1 Upstream Operation 

so incoming network transaction units are received from the UliVIs at their EUSL interface 400 receivers 
402. are converted to words in serial to paralist converters 404 and are accumulated in FIFO buffers 94. 
Each EUSL interface is connected to the NIM transmit bus 95, which consists of a parallel data path, and 
various .signals for bus arbitration and clocking. Whsn a network transaction Unit has .beeri buffereci, the 
EUSL interface 400 arbitrates for access to the transmit bus 95. Arbitration, proceeds in parallel with data 

zs transmission on the bus. When the current data transmission is complete, the bus arbiter awards bus 
ownership to one of tlie competing EUSL interfaces, which ijeglns transmission. For each, transaction, the 
EUSL port number, inserted at the beginning of each packet by .port riumber generator 403, is transmitted 
first, followed by ttie network tfansaction unit. WiSiin an XL' interface 440, the XL transmitter 96 provides the 
bus clock, and performs paralfel to serial conversion 442 anddata transmission on the upstream XL 3. 

30 

7.3.2 Downstream Operation 

NefA'Ofk transaction units arriving from the MINT on the downstream XL 3 are received within XL 
35 interface 440 by the XL receiver. 44S, which is connected via serial to paraiiet converter 448 to ttie NIM 
receive bus 430, The receive bus Is sim.ilar to, but independent of the transmit bus. Also connected to the 
receive bus via a paralfel to serial converter 408 are the SUSL Interface transmitters 4-0. The XL receiver 
performs Serial to paralie! conversion, provides the receive bus clock, and sources ttie incoming data onto 
the iDus. Each EUSL interface decodes the SUSL port number associated with the data, and forwards She 
4(y data to its EUSL if appropriate. More than one EUSL interface may forv/ard the data if required; as In a 
broadcast or multicast operation. Each decoder 409 checks the receive bus. 430 while port numb6r(s) are 
being transmitted to sea if the following packet is destined for the end user of this EUSL interface 400; it so, 
the packet is forwarded to transmitter 410 for delivery to an EUSL 14, Illegal EUSL port numbers {e,g. 
violations of the error coding scheme) result In the data being dropped (l,e. r^ot forwarded by any EUSL 
js interface). Decode block 409 is used to gate information destined for a particular EUS link from transmit bus 
95 to the paraiiei/serial converter 408 and transmitter 410. 



3 lIMTERFACiNG TO MAN 

SO 

8.1 Overview 

A user interface module (UIM) consists of the .hardware .and sof^are necessary to connect one or more 
65 end user systems (EUS), local .area networks {LAN), or dedicated point-to-point links to a single (viAN end 
user system link (EUSL) 14. Througtiout this section, the term EUS will be used lo genericaiiy refer to any 
of these network end user systems. Clearly, a portion of Sie UIM used to connect a particular type of EUS 
to MAN is dependent on the architecture of that EUS, as weil as ttie desired performance, flexibility, and 



40 



EP 0 33S 562 A2 



cost of the implementation. Some of the functions provided by a U!M, however, must be provided by every 
U!M in She system, it Is therefore convenient to view the architecture of a DM as having two distinct halves: 
the network ihterfaca. which provides the EUS-indspende.nf functionality, and the £US interface, which 
implements the remainder of the UIM functions for the particular type of EUS being connected. 

s Not ali EUSs wiil require the performance Inherent in a dedicated extel-fial Wrxk. The concentration 
provided by a NiM {described in §7) Is an sppropriate way to provide access to a number of EUSs which 
have stringent response time requirements along with lbs instantaneous 1/0 bandwidth necessary So 
effectively utilize the full MAN data rate, but which do not generate the volume of traffic necessary to 
efficiently Soad the XL. Similarly, several EUSs or LANs couH be connected to the same UIM via some 

10 intermediate link {or the UNs themselves). In this scenario, the UIM acts as a multiplexer by providing 
several EUS (aduslly LAN or link) interfaces to go with one network interface. This method is well suited to 
EUSs which do not allow direct connections to their system busses, and which provide oniy a Sink 
connection that is itself iimited in bandwidth. End users can. provide their multiplexing or concentration at a 
UIM and fvSAN can provide further multiplexing or concenfraiion at the N!M. 

?5 This section examines the architectures of both the' network Interface and EUS interface halves of the 
UIM. TTie funcfiofis provided by the network interface are described, and She architecture is presented. The 
heterogeneity of EUSs that may be connected to MAN does not allow such a generic treatment of the EUS 
interfaces. Instead, the EUS interface design options are explored, and a specific exampie of an. EUS is 
ussd to Illustrate one possibie EUS interface design. 

8.g UIM - Network Interface 

The UIM network, interface implements the EUS-independent functions of the UIM. Each network 
s$ interface connects one or more EUS interfaces to a single MAN EUSL. 



8.2,1 B asic Functions 

30 The UIM network interface must perform the following functions: 

EUS Link interfacing. The interface to the EUS Link includes an optical transmitter and recsi^/er, along with 
' tH9li.ardwar9 necessary to perform the link level functions required by ttie EU§L (e.g. CRC generation and 
checking, data formatting, etc.), 

Data bufferinQ. Outgoing network transaction units (i.e. packets and SUWUs) must be buffered so that they 
35 rnay be transmitted on the fast network link without gaps. Incoming network transaction units are buffered 
for purposes of speed matching, and level Shree (and above) protocol processing. 

Suffer m emory management. The packets of one LUWU may arrive at the receive UIM Interleaved with 
those* of an.o1h9r LUWU, In o(der to support this concurrent reception of several LUWUs, l.he network 
interface must manage its receive buffer memory in a dynamic fashion, allowing incoming packets to be 
40 chained together into LUWUs as they arrive. 

Protocol processing.. Outgoing LUWUs must be fragmented into packets, for' transmission into the network. 
Siirilirly, incoming packets must be recbmblned into LUWUs for delivery to the receiving process within 
the EUS. 



45 

B.g.2 Architectural Options 

Clearly, all of the functions enumerated in the previous subsection must be performed in order to 
Interface any EUS to a MAN' EUSL, However, some architectural decisions must be made regarding where 

50 these functions are performed; i.e., whether they are interna! or external to the host itself. 

The first two functions must be located external to the host, although fgr different reasons. The first and 
lowest level, function, that of interfacing to the MAN EUS Link, must be implemented externaliy sirnply 
because II consists of special purpose hardware which is not part of a generic EUS. The EU3 link interface 
simply appear? as a t;idirectiona( I/O port to the remainder of She UIM network interface. On She otfier hand, 

SB the second function, data buffering, cannot be implemented in existing host memory because the bandwidth 
requiremet5ts are too stringent. On reception, the network interface must bs able to buffer incoming packets 
or SUWUs back-to-back at the full network data rate .(150 Mb/s). This .data rate is such that it Is generally 
Impossible to deposit incoming packets directly into EUS memory. Similar baridwidth constraints apply to 

41 



EP 0 335 562 A2 



packet and SUWU ti'ansmisston as mil since they must t)8 completely buffered and then transmitted at itie 
full 150 Ubis rate. These constraints make it desirable to provicie the necessary buffer memory sicternBl to 
'ha EUS. H" should be noted that while FIFO memory will suffice to provide tfie necessary speed matching 
for transmission, the lack of flow control on reception along with the Imerleaving of received packets 

5 necessitate that a larger amount of random access memory be s>rovided as receive buffer memory. For 
MAN, the size of receive buffsr memory may range from 256 Kbytes to. 1 Mbyte. The particular size 
depends on the Interrupt latency of the host and on the maximum size LUWU allowed ijy the host software. 

The final two functions involve processing, which could conceivably be performed by the host 
processor Itself. The third function, buffer memory mariagement, involves the timely allocation and 

10 deallocation of biocks of receive buffer memory. The latency requirement associated with the allocation 
operation is stringent, due once more to the high data rates and the possibility of packets aniving baok-to- 
back, However, Hiis can be alleviated (for reasonable burst sizes) by pre-ailocating several blocks of 
memory. It is possible, therefore, tor the host processor to manage the receive packet buffers. Similarly, the 
host processor riiay or may not assume the burden Of the fourth function, that of MAN protocol processing. 

js Tfie location of these final two functions determines the level at which the EUS connects to the U!M. If 
the host CPU assumes the burden tor packet buffer memory management and MAN protocol processing 
(the "local" confrguration), then the unit of data transferred across the EU.S ititerface is a packet, and the 
host is responsible for fragmenting and recombining LUWUs. it, on the other hand, those functions are off- 
loaded to another processor in the Ulfyl, the front end processor {FEP} configuration, the unit of data 

so transferred across the EUS interface is a LUWU, While in, theory, subject to interleaving constraints at the 
EUS interface, the unit of data transferred may be any amount less than or equal to the entire LUWU, and 
the units delivered by the transmitter need not bs the same size as those accepted by the receiver, for a 
general and uniform solution, useful for a variety of EUSs, the LUWU is to be preferred as the basic unit: 
The FEP configuration offloads the majority of the processing burden from the host CPU, as well as 

25 providing for a higher levei EUS interface, thereby hiding the details of the network operation from the host. 
With the FEP, the host knows only about LUWUs, and can control their transmission and reception at a 
higher, less CPU intensive level. 

Aithough a lower cost interface is passible utilizing ths local configuration, the network interface 
architecture described in ths following section is a FKP configuration more characteristic of that required by 

30 some of the high perfomiance EUS that are natural users of a MAN network. An additional reason for 
choosing the FEP configuration initiaily is that it is better suited for interfacing MAN to a LAN such as 
ETHERNET, in which case there is. no "host CPU" to provide buffer memory management and protocol 
processing. 

8,2,3 Networl< [pterface Architecture 

The architecture of the UIM network interface is depicted in FIG, 17. Ths following subsections briefly 
descriije the operation of the UiM network interface by presenting scenarios for the transmission and 
40 reception of data. An FEP-type architecture is employed, .i,e-, receive buffer memory management and 
MAN network layer protocoi processing are performed external to the host CPU of the EUS. 



8.2.3.1 Transmission of Data 

45 

Ths main responsibilities of the network Interface on transmission are to fragment the arbitrary sized 
franEmit user work units (UWUs) into packets (if necessary), encapsulate the user data in the MAN header 
and trailer, and transmit the data to the network. To begin transmission, a message from ttie EUS 
requesting trartsmission of a LUWU traverses the EUS interface and Is handled by network interface 

so processing 4S0, which also implements memory management and protocol processing functions. For each 
packet, .the protocoi processor portion of the interiace processing 4S0 fonTmlates a header and writes it into 
the transmit FIFO 1 5. Data for that packet is then transferred across the EUS interface 451 into Xhs transmit 
FIFO IS within link handler 460, When the packet is completeiy buffered, the link handler 460 transmits it 
onto the MAN EUS link using transmitter 454, followed by the trailer, wiiich was computed by the link 

55 handler 460. The fink is ffow controlled by the Nllv! to ensure that the NlM packet buffers do not overflow, 
This transmission process is repeated for each packet. The transmit FIFO 16 contains space for two 
maximum length packets so that packet transmission may occur at ths maximum rate. The user is notified 
via the EUS interface 451 when the transmission is complete, 

4S 



EP 0 335 562 A2 



a^gjj. Reception of Data 

Incoming: data Is received by receiver 4S8 and loaded at the 150 IWIB/S link rals into elastic ijuffer 452- 
Ouai-oorted' video RAM is utilized for the receive buffer memory 90, and tSis data is untoaded from th© 
elastic buffer and ioaded into ihe shift register 464 of receive tsuffer memory 80 via its serial access port. 
Each Dacket is Itien transferred from tlie shift register into the main memory array 466 of tlie receive buffer 
memorv under the controi of the receiver DMA sequencer 452. The block addresses used to perform these 
transfers are provid^ by the network interface, processing arrangement 450 of UiM 13 via the buffer 
^onjroiler 456. which buffers a small number of addresses, in hardware to reSteve the strict latency 
requirement, which wouSd otherwise by imposed by bacMo-back SUWUs. Block 450 is composed of 
blocks 530 540 S42, SSO, 5S2, 5S4, 556, 558. 560. and 562 of m... 13. Because the network mtorface 
Drocessing has direct access to the buffer memory via its random access port, headers are not strspped off; 
rather they are placed into buffer memo^ along vyitt, tha data. The receive queue manager 558 wthln 450 
handles the headers and. with Input from the memory manager 550, keeps track of the various SUWUs and 
LUWUs as they arrive. The WS is notifi^ o1 the arrival of data by the network interface processing 
arrangement 450 «a the EUS interface. The details of how data is delivered to the EUS are a function of the 
partlcuiar EUS interface being empioyed, and are described, for example, in section 6,3,3.2. 



8.3 UiM - EUS Interfaces 



B.3.1 Philosophy 

This section describes She "half*' of the network Interface that is EUS dependent. The basic function of 
the EUS interface is the delivery of data between the EUS memory and the UIM network mteffaca, in both 
directions Each particular EUS interface will defin© the protocol to effect delivery, the fomnat of data and 
control messages, and the physical path for control and data. Each side of the interface has to implement a 
flow control mechanism to protect itself from being overrun. Ths EUS m^st be able to contro its ov^n 
memory and the flow of data Into ft from the network, and the network has to be able to protect rtseK as 
well Only at this basic functional level Is it possible, to talk about commonatiti' in EUS mterfacss. EUb 
interfaces will be different because of EUS hardware and system software, differences. The needs of the 
applications using the network, coupied with the capabilities of the BUS, will also force imerface design 
decisions dealing with performance and flexibility. There will be numerous interface choices even for a 

^'"^This'^ef of choices means that the interface hardware can range from simple designs with few 
components to complex designs including sophisticated buffering and memory management schemes 
Control functions in. the interface can range from simple EUS interfaces to handling network level 3 
protocols and even higher ievsl protocols for distributed applications. Software in. the EUS can also range 
from straightforward data transmission schemes that fit underneath existing networking software, to more 
extensive new EUS software that would -altow very flexible uses of the network or allow the hsghest 
performance that the network has to offer. These interfaces must be tailored to the specific existing EUS 
hardware and software systems,, but there must also, be an analysis of the cost of interface features in 
comparison to the benefits they would deliver to the network applications running in these EUSs. 

8.3.2 BJS Interface Options 

The tradeoff between a front end processor (FEP) and EUS processing is one example of different 
i^tg^a^g approaches to accomplish the same basic function. Consider variations in receive buffering. A 
specialized: EUS architecture with a high performance system bus could receive network packet messages 
directly from the network' links. However, usually the interface wiii at least buffer packet messages as they 
' come off the link, before they are delivered into EUS memory. Normally EUSs, either transmitting to or 
receiving from the network, do not know (or want to know) anything about the Internal P^^f f™f ^'g^- f 
that case, the receiving interface might have to buffer multiple packets that come from the lUWU of data 
that is the natural sized transmission unit between the transmit and receive EUSs, Each one of these three 
receive buffering situations is possibie and each would require a significantly different EUS interface to 
transfer data into the EUS memory. If the EUS has a particular need to process network packet messages 



EP 0 335 562 A2 



anct has l}ie:processing power and system bus performance to devote to th^t Sask Jfien the EUS dependent 
portion of ttie network interface vioulti be simpte. However, often it will be desirable to off-load that 
prgc-essing into tfe EUS interface and improve ths EUS performance. 

Different trarssmit buffering approaches also IKustrats the tradeoff between FSP and EUS processing. 
For a specialKed application, an EUS with high performance processor and bus could send network packet 
messages' directly into the network. But if the application used EUS transaction sizes that were much larger 
that She packet message sl^, It might lake tod mifcfi of the EUS processing to produce packet messages 
on its own. An FEP could oflioad that work of doing this ieyel 3 network protocol formatting. This wouid also 
bs the case where the EUS wishes to be Independent of the internal neiwork nnessage size, or where it has 
a diverse set of nefvrork appBcations with a great variation in transmission size, 

Depending on the hardware architecture of the EUS, and the level of performance desired, tfiere is the 
chQics behveeft programmed HO and DMA to move data between EUS memory and the network interface. 
In the programmed VO approach, probably both control and data wili move over the same physical path. In 
the DMA approach there wiil be some kind of sJiared memory Interface to move control information In an 
EUS interfacing protocol, and a DMA controller in the EUS Interface to move data between buffer memory 
and EUS memory over the EUS system bus withoirt using ELtS processor cycles. 

There are several alternatives that enist for the location of EUS buffering for network data, Tlie data 
could be buffered on a front end processor network controller circuit board v^iilb its. own private memory. 
This memory can be connected to the EUS by busses using DMA transfer or dual ported memory 
accessed via a bus or dual ported memory located on the CPU side of a bus using private busses. The 
application noy/ must access the data. Various techniques are available; some involve mapping the end user 
work space directly Jo ttie address space used by the um to store the data. Other techniques require the 
operating system to further buffer the data and recopy into the user's pifvats address space. 

Options exist in writing the driver level software In the EUS that is respopsible for moving control and 
data information .over the interface. The driver could also Rnplsment the EUS interface protocol, processing 
as w.el! as just moving bits over the interface. For the driver to still run efficiently the protocol processing in 
the driver might not bs very flexible. For more flexibility based on a particular application, the EUS interface 
protocol processing could be moved up to a higher level. Closer to the application, mora intelligence could 
be applied to She interface decisions, at the expense of more EUS processing time. The EUS could 
impiemsnt: various interface protocol approaches for delivery of data to and from, the network: prioriiiHation, 
preemption, etc. Network applications that did not require such flexibility could use a more direct interface 
!o the driver and the network. 

So, there are a variety of choices to be mede at diffefent levels in the system In both the hardware and 
the software. 



a.3.3 impiementatton ^Exampie: SiJN Workstation Interface 

To illustrate the EUS dependent portion of the interface we describe one specific interface. The 
interface is io the, Sun-3 VME bus based woritstallons manufactured by Sun Microsystems. Inc. This is an 
example of a single EUS connected to a single network interface. The EUS also allows connection directly 
to its system bus. The UIM hardware is envisioned as a single circuit board giat plugs into the VME bus 
system bus. 

First, there follows a description of the Sun I/O architecture, and then a description: of the choides made 
in designing the interface hardware, tiie interface protocol, and the connection to new and existing network 
applications software. 



a3,3,| SUN WorkstaSon VP Architecture 

The Sun-3's !/0 architecture, fa^sed on the VMS bus structure and Its memory management unit 
(MMU), provides a DMA approach ^called direct virtual memory access (DVMA). FIG. 17 shows the Sun 
DVMA. OVMA allows devices on the system bus to do DMA directly to Sun processor memory, and also 
allow main bus masters to do DMA directly to main bus slaves without golng through processor memory, it 
is called "viiiua!" t^ecause the addresses that a device on the system bus uses to communicate with the 
kernel are virtual addresses similar to those the CPU would use. The DVMA approach makes sure that ail 
addresses used by devices on the bus are processed by the MMU, just as if ttiey were virtual addresses 
generated by the CPU. The slave decoder 512 (F16. 16) responds to the lowest megabyte of VME bus 

44 



EP 0 335 562 A2 



addfess space (0x0000 0000 - OxOOOt ffff, in the 32 bis VME address space) and maps mts megabyte into 
the' most' ^significant megabyte of the system virtuai address space (OxffO 0000 - Oxfff ffff in the ^ bit 
virtual addfws space), (OX means that the subsequent characters are hexadecimal ciiaracters.) Wlien the 
driver needs to send th» buffer address to the device, i! must' strip off the high 8 bits from the 28 bit 
s address, so that the address that the device puts on She bus will be in the low megabyte (20 bits) of the 
VIvlE address space. 

in F!G. i.a, the CPU 500 drives a memory management unit 502, which is connected, to a VME bus 504 
and on board memory 506 that includes a buffer 508. The VME bus comniunicates with DMA devices 510. 
Other m board bus masters, such as an ETHERNET access chip can ^so access memory 508 via MfvlU 

)0 : 502. Thus, devices can only make DVMA transfers in memory buffers that are reserved as DVMA space in 
these low (physical) memory areas. The terrlel does however support redundant mapping of physical 
memory pages into multiple virtual addresses, in this way, a page of user memory (or kernel memory) can 
be mapped into DVMA space in such a way that the data appears in (or comes from) the address space of 
the process requesting that operation. The driver uses a routine called rnbsetup to set up the kernel page 

15 maps to support^ Sils direct user space DVMA. 



8.3.3.2 SUN UIM - EUS In terface Approach 

JO As mentioned above there are.. m.any options ir^ designing a particular interface. With the Sun-3 
interface, a DMA transfer approach vi^as designed, an interface with FEP capabilities, an interface with high 
performance matching the system bus, and an EU5 software flexibility to allow various new and existing 
network appiications to use the network. FIG. 19 shows an overview of the interlace to the Sun-3. 

Tlie Sun-3's are systems with potentially many simultaneous processes running in support of the 

S5 window system, and multiple users. The DMA end FEP approachs were chosen to offload the Sun 
processor whne the network transfers are taking pisce. The UiM hardware is envisioned as a single circuit 
.board that plugs in5o the VME bus system bus. With the chance to cOfinect directly to the system bus it is 
desirable to attempt the highest performance interface ppssibie. Sun's OVMA provides a means to move 
data efficiently to and from processor memory. There is a DMA controiter 92 in the UIM {FIG. 4) to move 

30 data from the UIM to EUS memory and data from EtJS memory to the UlWt over the bus, and there wll! be a 
shared memory interface to move contrcl information in the host interfacing protocol. The front end 
processor (FEP) approach means that the data from the network is presented to the EUS at a higher level. 
Level 3 protocol processing has been performed and. packets have been linked together into LUWUs, the 
user's nafurai sized unit of transmission. Witli the poiential variety of network sppiicatiohs that could be 

SB running on the Sun the FEP approach means that EUS software does not have to t>e tightly coupled to the 
Internat network packet format 

The, Sun-3 OVMA architecture will limit the EClS transaction sizes to a maximum of one megabyte, if 
user buffers are not locked In, Bien kernel Ipuffers would be used, as an intermediate step between the 
device and the u.ser, with the associated performance penalty for the copy operation, if transfers are going 

40 to be made directly to user space, using the "rnbsetup" approach, the user's space will tje locked into 
memory, not avaiiabie for swapping, during the whole transfer process. This is a tradeoff; it ties up the 
resources in the mactilne. but it may be more efficient if it avoids a copy operation from some other buffer 
in the i?ernei. 

The Sun system has existing n^ork applications running on STHSRNET, for example; their Network 
IS File System (NFS), To run these existing, applications on MAN but still leave open the possibility for new 
applications thatcould use the expanded capabilities of MAN, we needed flexible EUS software and a 
flexible interface protocol to be able to stmuiianeousiy handle a variety of network applications. 

FIG. 18 is a functional overview of the operation and interfaces among the NIM, UIM, and EUS. The 
specific EUS shown in this tliustrative example is a Sun'3 workstafton, bu! the principles apply to other end 
50 .user systems having greater or lesser sophistication. Consider first the direction from the MINT via the NIM 
and UIM to the EUS, As shown in FIG. 4, data tiiat Is received from M!NT;i1 over link 3 is distributed to 
one of a plurality of Ulfvls 13 over links 14 and is stored in receive buffer memory 90 of such a UIM, from 
which data is transmitted in a pipelined fashion over an EUS bus 92 having a DMA interface to the 
appropriate EUS. The control structure for accomplishing this transfer of data is shown in FIG. 19, which 
55 shows that the input from the MiNT is controlled by a MINT to HM link Siandler 520, which transmits its 
output under the control of router 522 to one of a plurality of NIM to UiM link handlers (N/U LH) 524 
MINT/NIM link handier (M/N LH) 520 supports a varisnS on the Metrobus pbysicai layer protocol. The N!M to 
UiM Sink handler 524 also supports the Metrobus physical layer protocol: in this implementatloni but other 

45 



EP 0 335 562 A2 



protocols could be siipported as well. It Is posslbte tha* different protocols could coexist on the same NIM, 
The output of (he N/U LH 534 is sent over a 14 to a UIM 13, where it is buffered In receive buffer 
rnei-nory @0 by WIM/UIM link Handier 552. The buffer address is supplied by rjwmory manager 5B0, which 
manages free and allocated packet Ijuffer li^ts. The status of the packet receplson is obtained by N/U LH 

s 552, which computes and verifies the checksum over header an data, ^d outputs the status information to 
receive packet, handter 556, which pairs ths status with the buffer address received from memory manager 
550 and queues ttis informatior: on a received packet list. Information about received packets is then 
transferred to receive queue manager SSB, which assembles packet information into queues per LUWU and 
SUWU, and which also keeps a queue of LUWUs and SUWUs about which the EUS has not yet been 

JO notified. Receive queue manager 558 is polSed for information about LUWUs and SUWUs by the EUS via 
the gUS/Ulfirt link handler (E/U LH) 540, and responds with notification messages via UIM/EUS link handter 
(U/E LH) 582. n^assages which, notify the EUS of the reception of a SUWU also contain the data for the 
SUWU, thus completing the reception process. In the case of a LUWU, however, the EUS allocates its 
memory for reception, and issues a receive request via 6/U LH 540 to receive request handier 560, which 

;s formulates a receive worklist and sends it to resource manager SS4, which controls the hardware and 
effects the data transfer over EUS bus 9S (FIG. 4) via a DMA arrangement. Note that the receive request 
from the EUS need not be for the entire amount of data in the LUWU; indeed, all of the data may not have 
even arrived at the UIM when the EUS makes its first receive rsquesL When subsequent data for this 
LUWU arrives, the EUS will again bs notified and wi!l: have an opportunity to make additional receive 

30 requests. In this fashion, the reception of the data is pipelined as much as possible in order to reduce 
■ latency.- Following data transfer, receive request handler 560 informs the EUS via U/E IH 562, and. directs 
memory manager 5S0 to de-allocate the memory for that portion of the LUWU that was delivered, thus 
making that memory available for new incoming data. 

In the reverse direction, i.e., ?rom EUS 26 So MINT 11, the operation is controlled as follows; driver 570 

25 of EUS 26 sends a transmit requ^$t to transmit request handler 542 via U/E LH 562. in the case of a 
SUWU. the transmit .requsst itself contains the cats to be transmittedv and transmit request handler 542 
sends this data in a transmit worklist to resource manager 554, which computes the packet header and 
writes both header and data into buffer 15 {FIG. 4), from which is is transmitted to NIM 2 by Ulfvl/NIM link 
handier 54B when authorised to do so via the flow control protocol in force on link H. The packet is 

30 received at NiM 2 by UIM/NIM link handler 530 and stored in buffer 94, Arbiter 53S then selects among a 
plurality of buffers 94 in NIM 2 to select the next packet or SUWU to be transmitted under the control of 
^^IM/Ml^tT link handter 534 on MINT link 3 to MINT 11. In the case of a LUWU, transmif request handler 
542 decomposes th$ request into packets and sends a transmit worklist to resource manager 554, which, 
Spr each packet, formulates the header, writes the header into buffer 15, controls the hardware to effect the 

35 transfer of the packet data over EUS: bus 92 via DMA, and directs U/N LH S43 to transmit the packet when 
authorized to do so. The transmission procsss is then as described for tie SUWU c^se. in either case, 
fransmit request handler 542 is notified by resource manager 554 when transmission of the SUWU or 
LUWU' is complete, whereupon driver 570 Is notified via U/E LH 552 and may release its transmit buffers if 
desired. 

JO FIG. !?■ also shows details of the interna! software structure of EUS S6. Two types of arrangements are 
shown, in one of which blocks 572, 574, 576, 578, 580 the user system performs level 3 and^ higher 
(unctions. Shown in FIG. 19 is an impiomentation based on Metwork of the Advances Research Projects 
Administration of the U.S. Department of Defense (ARPAnet) protocols including an internet protocol 580 
(level 3>, transmission control protoco! (TCP) and user datagram protocol (UDP} block 578 (TCP being used 

45 for connection oriented service and UDP being arranged for connectionless service). At higher levels are the 
remove procedure call (block 575), the network file server {block 574) and the user programs 572. 
Alternatively, the seivices Of the MAN network can be directly invoked by user (block 582) programs which 
directly interface with diiver 570 as indicated by the null biock 584 between the user and the driver. 

8.3.3.3 EUS. Interface Functions 

The main functional parts of the iransmit EUS Interface are a control Interface with the EUS, and a DMA 
interface to transfer data between the EUS and the UIM over the system bus. When transmitting into the 
ss network, control information is received thai dsscribes a LUWU or SUWUs to be transmitted and Snfomiation 
about the EUS buffers where the data resides. The control information from the EUS includes destination 
MAN address, desfination group {viriaja! network), LUWU length, and type fields for type of sen/ice and 
higher level protocol type. The DMA interface moves the user data over from the EUS buffers into the UIM. 

46 



EP 0 33S 562 A2 



Thfr network interlace portion is responsible for formatting the LUWUs and SUWUs into packets and 
iransmitting tfie Rackets on the link to the network. The control interface couid have several variations for 
flow contro!. multiple outstanding reauests. priority, and preemption. The UIM is In control of tlie amount of 
data that it takes from the EUS memory and sends into the network. 

On the receive gide. ttiB EUS poils for InfornDation about packets that iiave been received and the 
control interface responds with LUWU information from the packets header and current information abouf 
bow much of the EUS transaction has arrived. Over the contro! interface, the EUS reqtiests to receive data 
from these messages, and the DMA interface will send the data from memory on the UIM into the EUS 
memory buffers. The poll and response mechanism in the interface protocoi on the receive side allows a lot 
of EUS flexibility for receiving data from the network. The EUS can receive either partial or anlire 
transsctions. that have come from the source EUS. I; also provides She fiow control mectlaf^ism for the EUS 
on receive, EUS is in control of what it receives, when It receives it .and in what order. 



rs 8.3.3.4 SUN Software 

This section describes how a typical end user system, a SUN-3 workstation, is connectable to MAN, 
Other end user systerr.s would use different software. The interface to MAN is relatively straightforward and 
efficient for a number of systems which have bean studied. 



$.3,3.4.1 Existing Network Soft_w_afe_ 

The Sun UNIX® operating system is derived from the 4.2BSD UNIX system from the University of 

2B California at Berkeley. Like 4,2BSD It contains as part of the kernel, an implementation of the ARPAnet 
protocDis: internet protocol (IP), transmission control protocol (TCP) for connection-oriented service on top 
of IP, and user datagram protocol (UDP) for conriectidnless setvice on top of IP,: Current Sun systems use 
IP as an Internet sublayer in the top half of the network , layer. The bottom half of the network layer is a 
network specific sublayer. It currently consists of driver level software that interfaces to a specific network 

38 hardware connection, namely an ETHERNET controller, where the link layer MAC protocol is imptementeci. 
E7HgR[yjEX is the network currently used to connect Sun workstations. To connect Sun ::WorkstatiDns' with a 
MAN network, it Is necessary to fit into the framework of this existing networking software. The software for 
the MAN network interface in the Sun will be driver level software. 

The MAN network is naturally a connectioniess or datagram type of network. LUWU data with control 

35 information forms the EUS transaction crossing the interlace into the network. Existing network services can 
be provided using the MAN network datagram LUWUs as a basis. Software in. the. Sun will build up both 
connectionless and connsction-oriented transport and application services on top of a MAN datagram 
■ network layer. Since the Sun already has a variety of network application software, the MAN driver will 
provide a basic service with: the flexibility to multiplex multiple upper layers. This multiplexing capability wiJi 

40 be necessary not just for existing applications but for additional new applications that will use MAN'S power 
more directly. "'■ 

There needs to be an address translation service function in the EUS at the driver level in the host 
software. It would allow for IP addresses to be translated into MAN addresses. The address translation 
service is similar in function to the current Sun address resolution protocol (ARP), but different in 

45 implementatjon, H a particular EUS .needs to update its address translation tables, it sends a network 
message with an. IP address to a well known address translation server. The corresponding MAN address 
will be returned. With a set of such address translation services, MAN can then act as the underlying 
network for rriany different, new and existing, network software services in the Sun environment. 



6,3.3.4.2 Device Driver 

On the top side, the driver multlpiexes several different queues of LUWUs from the higher protocols 
and applications for transmission and queues up received LUWUs in several different queues for the higher 
layers. On the hardware side, the driver sets up DMA transfers to and frcm user memory buffers. The driver 
must communicate with tiie system to map user buffers into memory that can be accessed by the DMA 
controller over tha main system bus. 

On transmit, the driver must do address transiation on the outgoing LUWUs for those protocol layers 



47 



EP 0 33e 562 A2 



tfiat sre not using MAN addresses, i.e., the ARPAnet protocols. The MAN destination address, and 
destination group is inciitded in MAN datagram control information that is sent when a LUWU is to be 
IrtinsmiUed, Other transmit contro! information will bo LUWU length, fislds indicating type of service and 
higher level protocol, along with the data location for DMA. The UIM uses this contro! information to form 

5 packet headers and to movb the LUWU data out of EUS memory. 

On receive, the driver wilt implement a potl/response protocol with the UtM noSfying the EUS of 
incoming data. The poll response will cont^n control information that gives source address, total LUWU 
length,, amount of 'data that has arrived up to this point, the type fields indicating higher protoco! layers, and 
scrrte agreed on amount of the data from the message. (For smalt messages, the whole user message 

fo couJd amve in (his poll response.) The driver itself has the fte)?ii3i!iiy based on the type field to decide how 
to receive this message end which higher level entity to pass it on up to. It may be, that based, on: a certain 
type field, it may just deliver the announcement, and pass tiie .reception deci.^lori on up to a higher layer. 
Which ever approach is used, evantualiy .a coolrot request for the dsiivery of the data from the UtM: to the 
EUS memory is made, which results in a DMA operation by the UliVt. EUS buffers to receive the data may 

fs preatlocated for the protoco! types where the driver handles the reception in a fixed fashion, or the driver 
may : have to get buffer information . from a higher layer in She case where it has just passed the 
announcement on tip. This is the type of ftexibtiily we need in the driver to handle both existing and new 
appticaiions in the Sun environment 

20 

8.3.3,4.3 Raw MAN Interface Software 

Later, as applications are written that wish to directly use the capabilities of the MAN network, the 
address translation function will not be necessary. The MAN datagram conlrol information wit! be specified 
25 directly by special MAN network layer software. 



9" MAN Protocols 



9.1 Overvigw 

The MAN protoco! provides for the delivery of user data from source UIM across the network to 
destination UIM. The protocol is connectionless, asymmetric for receive and send, implements error 
35 detection without correction, and discards layer purity for high performance. 



40 The EUS sends datagram transactions called LUWUs into th© network. The data 8iat comes from the 
BUS resides in EUS memory. A cotitroi message from the EUS specifies fo the U!M the date length, the 
destination address for this LUWU, the destination group and a typo field which could contain information 
like , the user protocol and the network class of service requirsd. Together, ttie data and the control 
information form the LUWU. Depending on the type of EUS interface, this data and controi cad be passed 

45 to the UIM in different ways, but it is litwly that the data is passed in a DMA transfer. 

The UIM will transmit this LUWU into the network. To reduce potential, delay, .larger LUWUs are not sent 
into the network as one contigiiOMS stream. The UIM breaks up the LUWU into fragments called packets 
that can be up to a certain maximum size, An UWU smaller than the maximum size is called a SUWU and 
wit! be contained in a single packet. Several EUSs are concentrated at the NIM and packets are ^ansmitted 

so over the link from the UIM to the NIM- (the EUSL). Packets from one UIM can be demand multiplexed on 
the link from the NIM to the MINT (the XL) with packets from other EUSs. Delays are reduced because no 
EUS has to wait for the compieBon of a long: LUWU from another EUS sharing the link to the MINT. The 
UIM generates a header for every packet that contains information from the original LUWU transaction, so 
that each packet can pass through the networtt from source UIM to desfinalion UtM and b9 recombined into 

55 the same LUWU that was passed into the network by the source EUS. The packet header contains the 
information for the network layer protocol in the MAN network. 

Before the NIM sends the packet to the MINT -on the XL, It adds a NlM/MtNT header to the packet 
message. The header contains the source port number identifying the physical port on th© NIM where a 

4B 



EP 0 335 562 A2 



particular EUS/UIM is connected. This header is used by the MINT to vertfy that the source EUS Is located 
St tha port where he is authorized to be. ThiS type of additional chack Is especiafly important for a data 
network, tha! serves one or more virtual networlc§. to ensure privacy foi' such virtual networks, The MiF^iT 
uses the packet header to determine the route for the packet, as well as other potential services. The M\HT 
does not change the contents of the packet header. When the fLH in the I^iNT passes the packet out 
through Ihi'e switch to be sent out on the XL to the destination NIM, it places a different port number in the 
NIlVf/fi/'INT header. This port number is the physical port on the NIM where the destination EUS/UIM is 
connected. The destination NIM uses Ihis port number to route the packet on the fly to the proper EUSL. 

The various sections of a packet are Identified t>y delimiters according to the link format. Such 
delimiters occur between the NIM/MINT header 600 and She MAN header 610, and between the MAN 
header and the rest of the packet. The delimrter at the MAN tisader/rest of packet border is required to 
signal the header -check sequence circuit to insert or check the header check. The NIM broadcasts a 
received packet to all ports: in the NIM/MINT header field. 

When the packet arrives at the destination UIM, She packet header contains the original information from 
me source UIM necessary to reassemble the source EUS transaction. There Is also enough information to 
allow a variety of EUS receive Interface approaches including pipelining of other variations of EUS 
transaction size, ptioritlzation, and preemption. 



^■"^ Description 



9.3.1 Link Layer Functions 

The link functions are described in Section 5. The functions of message beginning and end demarca- 
tion, data transparency, and message check sequences on the. EUSL and XL links .are discussed there. 

A check sequence for the whole packet message is performed at the link level, but Instead of corrective 
action, being taken there, an indloatlon of the error is passed on up to the network layer for handling there, A 
message check sequence error results only In incrementing an error count for administrative purposes, but 
the message transmission continues. A separate header check sequence is calculated In hardware in the 
UIM, A header check secjuence error defected by the MINT control results in. the message being thrown 
away and art error count being incremented for administrative purposes, At the destination UIM a header 
cfieck sequence error also results in tha message being thrown away. The data check sequence result can 
be conveyed to the EUS as part of the LUWU arrival notification, and the EUS can determine whether of not 
to receive^ the message. These violations of layer purity have bsen made to simplify the processing at the 
iltsk layer to increase speed and overall network performance. 

Other "standard" iink layer functions Ske error correction and flow control are not performed in the 
conventional manner. There are no acknowledgement messages returned at the link level for error 
correction (retransmission requests) or for flow confrol. Row control is signaled usirig special bits in the 
framing pattern, The complexity of X,2S-!ike protocols at the link level can be tolerated for low speed links 
where the processing overhead' wilt not reduce performance and does increase, the reliability of links that 
have high error rates. However, it is felt that an acceptable level of error-free throughput will be achieved by 
the low bit error rates in the fiber optic links in this network (Bit Error Rate less than 10 errors per trillion 
bits.) Alsp, because of the large amounts of buffer memory in the MINT and the UIM necessary to handle 
data from the high-speed links, it was felt that flow control messages would not be necessary or effective. 



S 3.2 Network L ayer 



9.3.2.1 Functions 

The message unit that leaves the source UIM and travels all the way to the destination UIM is the 
packet. The packet is not altered once it leaves she source UIM, 

"Hie information in the tJIM to UIM message header will allow the fpliewing functions to be performed: 
' fragmentation of LUWUs at the sourcs UIM, 

- recorpbination of LUWUs at the destination UIM, 

- rouOng to the proper NIM at the MlNTj 



SP 0 335 562 A2 



- routing to liie proper UlM^EUS port at She destination HM, 

' MINT transmission of yariafate length messages {e.g., SUWU, packet, n packets}, 
' destination UIM congestion control and arrival announcemsnt, 

- detection and handiing of message header errors, 

s • addressing of networlc entities for iriternai network messages. 

- EUS authentication for delivery of network services onty ta autliorized users. 



9.3.2.2 Format 

FIG, 20 shows the UIM to MINT Message format. The MkH header 510 consists of the Destination 
Address S12. the Source Address 614, the group (virtual network) identifier 516, group name 618, the type 
of service 620, the Packet Length {the header plus data lo bytes) 622, a type of service indicator 623. a 
protocol identifier 624 for use by end user systems for Identifying the contents of EUS to EUS header 630. 

;5 and the Header Check Sequence 626. The header is of fixed length, seven 32-bit words or 224 bits long. 
The MAN header is followed by an EUS to EUS header 630 to process message fragmentation. This 
header includes a LUWU identifier 633, a t.UWU length indicator 634, the packet saqusnqa number 636, the 
protccoi idemifier 63S for identifying the contents of tiie' internal EUS protocol which is the .header of user 
data 640, and the nijrTiber 639 of the initial byte of data of this packet within the totai LUWU of information. 

20 Finsily, user data 640. may be preceded for apprqsriate user protocols by the identify of ihe destination port 
642 and source port' 644. The fields are 32 bits because that Is the most efficient length (integers) , for 
present network control processors. En-or checking is performed on the header In control software; this Is 
ihe Header Check Sequence. At th? link ievel, error checking done over the whole message; this is the 
Message Check Sec)uence-634. The NIM/MiNT header 600 (explained below) is also shown in the figure for 

2s completeness. 

The destination address, group identification, lype of service, and the source address-are placed as the 
first five fields in the iriessage for efficiency in MINT processing. The destination and group identification 
are used for routing, Ihe size for memory management, the type fields for special processing, and the 
source is used for service authentication. 

30 ., 

9.3.2.2.1 Destination Address 

The Destination Address 612 Is a MAN address that specifies, to which EUS the packet is being sent A 
35 MAN address is 32 bits long and is a flat address that specifies an EUS connected to the network. (In 
internal network messages, if the high order bit in the MAN address is set, the address specifies an Intemal 
network entity like a MINT or NiM, instead of an EUS,). A MAN address will be permanently assigned to an 
EUS and will identify an EUS even if it moves to different physical location on the network. If an EUS 
moves, it. must sign In with a well-known routing authentication server to update the correspondence 
40 between its: MAN address and the pfiysicai port on which it Is located, Of course, the port number is 
supplied by the H\M so the EUS cannot cheat about where it is located. 

In the MINT the destination address will be used to determine a destination N!M for routing the 
message. In the destination NIM the' destination address will, be used to determine a destination UIM for 
routing the message. 



9 3^.2,2 Packet Length 

The Packet Length 622 is 16 bits long and represents the length in bytes of th.is message fragment 
50 including the fixgd lengtft header and the data. This length is used by the MINT for transmitting the 
message. It is also used by the destination UIM to determine the amount of data available for deliver/ to the 

eus. 



S3 9.3.2.2.3 Type Fields 

The type of service field 623 is 13 bits long and contains the type of service specified in She original 
EUS request. The MINT may look at the type of service and handle the message differently. The 



50 



EP 0 33S 562 A2 



various streams of data from the network, 

9, 3.2.2.4 Packet Sequence Nurnber 

This is a Packet Sequence Number 636 for this particular lUWU transmission.. It helps the recaiving 
UIM Imblns L Jmi.g LUWU. so that it can determine if ar,y ,^rn.n^^ '^ZwZZX 
been iost because of error. The sequence number is incremented for each fragr^en of tUWU The ast 
La™ nSer is negative to indicate the last packet of a LUWU, (An SUWU would have -1 as the 
™e ZSr.). If an inlinlte length LUWU Is ,m sent, the Packet Sequeoce Number .hould wrap 
around. {See UWU Length. Section 9.3.S,.2,7, tor an explanation ot an mfinrte tength LUWU,) 

!5 

9^^^ Swjree Addra^ 

The Source Addres? 614 Is 32 bits long and is a MAN address that specifies the EUS that sent the 
messaae ^See Destination Address for an expianation of MAN address.) The Source Address will be 
ao nS n £ MINT or network accounting. Coupled with the Port Number 600 from the NlM/MiNT header^ 
it is uL by the MINT to autiientlcate the source EUS for network services. The Source Address wiil be 
deiivered to the destination EUS so that It knows the network address of the EUS that sent the message. 

as 9.3,2.2.5 UWU !D- 

Tlie UWSJ ID ms. is a 32 bit number that is ysed by She destination UIM to recombine a UWU. Note that 
,he recombination m hs made easier because fragments cannot get out of order Mhe na^ork. T e UWU 
ID along with the Source and Destination Addresses, Identtliss packets of the same LUWU, or in ofter 
,0 wods, fragments of the origina! datagram !ransac«on. The ID must be unique for tha source'and destination 
pair for the time that ?ny fragment is in the network. 

3.3.2.2.7 UWU length 

The UWU Length 634 is 32 bits long and represents the total length of UWU data in bytes. In the first 
Daci<et of a LUWU this will allow the destination UiM to do congestion controi, and if me LUWU is pspeimed 
into the EUS, It wiii allow the UIM to begin a LUWU announcement; and delivery before the complete LUWU 

« ''"TlenS that'is negative .indicates an infinite length LUWU, wWch. is like an open channel between two 
EUSs Closing down an infinite length LUWU is done by sending a negative Packet Sequence Number. An 
infinite length LUWU only makes sense where the UIM controls the DMA Into EUS memory. 

45 3.3. 2.2,8 Header Check Sequence 

■mere is a header checic sequence 626, calculated by the transmitting UIM for header informatlon^o 
that the MINT and the destination UIM can determine If the header information was received correctly. The 
MINT or the desiination UIM wilt not attempt delivery of a packet with a header check sequence error. 

g.3.2.2.8 User Data 

The user dats 940 is the portion of the user UWU data that is transmitted in this fragrrient of the 
transmission. Following the data is the overall message check sequence 646 calculated at the Imk level. 



9.3.3 NlM/MiNT Layer 



SI 



EP 0 335 562 A2 



9.3.3.i Functions 

This proloco) layer consists of a header containing a NIM port number 600, This port rtum.ber hgs a one 
to one correspondence to an EU3 connection on the NiM and is prependsd by the NIM in block 403 {FIG. 
16) so that the user can.not enter falsa data therein. This header is positioned at the front of a packet 
message and is not covered by the overati paclcet message check sequence, It is checked by a group of 
parity bits in tlie same word to enhance its error reliability. The Incoming message to the MINT contains the 
source NiM port number to assist. In user authentication for networlc sen/ices that might be requested in the 
type fields. The outgoing message from the MINT contains the destination HM port number in place of the 
source port 600 in order to speed the demultiptexing/routing by the Nl!^ to the proper destination EUS, If 
the packet has a plurality of deslinalioo ports in one NiM, a list of these ports is placed at the beginning of 
the packet so that section 600 of the header becomes severa! words long. 



to LOGjN PROCEDURES AND VIRTUAi- N_ETWOf^KS 



10.1 ggngrai 

A system such as MAN is naturally most cost effective when it can serve a large number of customers. 
Such a large number of customers is likely to indude a number of sets of users who require protection from 
outsiders. Such users can conveniently be grouped into virtual networks. In order to provide sill! furtlrer 
flejiibiSty ancS protection, individual users may be given access to a number of virtual networks. For 
example, ait the users of one company may b& on one virtual network and the payroll department of tliat 
company may be on a separate virtual network. The payrctl department users should belong to both of 
ttiese virtual networks Since they may need access to general data about the corporation but the users 
outside the payroit department should not be members of the virtual network of the payroll department 
virtual network since they should not have access to payroll records. 

The login procedure' method of source checking and the method of routing- are the arrangements which 
permit the MAN system to support a large number of virtual networks while providing an optimum level of 
protection against unauthorised data access. Further, the arrangement whereby the NIM prepends the user 
pott to every packet, gives additional protedion against access of a virtual network by an unauthorized user 
by preventing aliasing. 



10.2 . Buiiding Up «Te Authorizatlorj. Data Base 

FI6. 15 illustrates the administrative control of the MAN network. A data base is stored in di$k 3Si 
accessed via operation, administration, and maintenance (OA&M) system 3SQ for authorizing users in 
response to a login request. For a. large MAN network, OA&M system 350 may be a distributed 
multiprocessor arrangement for handling a large volume of login requests. Tl^s data base is arranged, so 
that users cannot access restricted virtual r^etworks of which they are not members. The data base is under 
aie control of three types of super users, A first super user who would in get^eral be an employee of the 
common carrier that is supplying MAN service. This super user, refsn-ed to for convenience herein as a 
level 1 super user, assigns a block of MAN names which would in general consist of a block of numbers to 
each, user group and assigns type £ and type 3 super users to particular ones ot these names. The Isvei 1 
super user also assigns virtual networks to particular MAN groups. Rnaily, a level 1 super user has the 
.authority to create or destroy a MAN supplied sssvice such as electronic "yetbw page" service. A typa 2 
super user assigns valid MAN names from the block assigned to the particular user community, and 
ssstgns physical port access restrictions where appropriate. In addition, a type 2 super user has the 
authbriiy to restrict access to certain virtual networks by sets of membefs of his customer community. 

Type. 3 super users who are broadiy equal in authority to type 2 super users, have the authority to grant 
MAH narrses access to their virtual networks. Note that such access can only be granted by a type 3 super 
user if the tVlAN name's type 2 super user has allowed !his MAN name user She capability of joining this 
group by an appropriate entry In table 370. 

The data base includes Sable 360: which provides- for each user identification 362, tlie password 361 , the 
group 363 accessible using that password, a list of ports and, for special cases, directory numbers 364 from 
which. that user may transmit and/or receive, and the type of service 365, i.e.. receive only, transmit only, or 



52 ■ 



EP 0 336 562 A2 



receive and transmit 

The data bass also includes oser-capabitity tables 370.375 for raiatmg users (tabfe 370) to groups (table 
37S) potentially authorisabte for each user. When a user is So be authorized by a super user to access a 
group, this table Is checked to see if that group Is In the list of table 370; sf not the request to authorize that 
user for that group will be rejected. Super users have authority to enter data fof their group and their groups 
in tables 370,375. Super users also have the authority for their user to- move a group from Sable 375 into the 
list of groups 363 of the user/group autiiorkaSon tabie 360. Thus, for a user to actiess an outside group, 
sfjper users from both groups would have to authorize this access. 



10.3 Login Procedure 



At login time,, a user who has previously been appropriately authorized according to the arrangements 
described above, sends an initial Ipgin request message to tt>© MAN network. This message is destined not 

)s for any,:0ther user, but for the MAN netwofk ifself. Effectively, this message is a header dniy message which 
is analyzed by the MINT central control. The password, type of login service being requested, MAN group, 
MAN name and port number are all in the MAN header of a iosin resjuast, replacing other fields. This is 
done because only the header is passed by the XLH to the MINT central conb-ol. for further processing by 
the OA&M central control. The login data which intjiudes the MAM name, the requested MAN group name 

so (virtual network name), and the password are Compared against the login authorization data base 361 to 
check whether Si@ particular user is authorized to access that, virtual network From the physical port to which 
that user is connected (the physical, port was prapended by the NIM prior to reception of the login packet 
by the MINT), If ttia user is in fact properly authorized, then the tables in source cb6c!<Gr 307 and in router 
309 (FIG. 14) are updated. Only the source checker table of the checker that processes the login user's port 

25 is updated from a login for tenninal operations, tf a login request is for receive functions, then the routing 
tables of all MINTs must be updated to allow that source tij receive data from any authorized connectable 
user of the sarrie group who may ba connected to other MINTs to respond to requests. The source checker 
table 308 includes a list ot authorized name/group pairs for each .port connected to ttie NIM that. sends the 
dm Stream' to tha XLH for that source checker. The router tables 310, all include entries for all users 

30 authorized to receive UWUs. Each entry includes a name/group pair, and ttie corresponding NIM and port 
number.. The entries in the source checker list are grouped by group Identification numbers. The group 
identification number 616 is part of tha header of subsequent packsts tronrt the togged, in user, and is 
derived by the OA&M system 350^ at login time and sent back by the OA&M system via the MAN switch 10 
to the login user. The OA&M system 350 uses the MINT central conSrol's 20 access 19 to the MINT 

36 memory IB to enter the login acknowledge to ttie login user. On subsequent packets, as they are received 
Tfj. the MINT, the source checker checks the port number, MAN name and MAN group against the 
authorteatlon table in the source checker with this result that the packet Is allowed to proceed or not. The 
router then checks to see If the destination is art allowable destination for that input by checking the virtual 
network group name and the destination name. As a resuSt once a user is togged in,- the user can raach 

40 any destination that is in the routing tables, i.e.. that has previously logged in for access in the read only 
mode or the read/write mode, and that has the same virtual network group name as requested .in- the login; 
In contrast unauthorized users are blocked in every packet. 

WTiile in the present embodiment, the checking is done tor each packet, it could also !?e done for each 
usef w.ork unit (LUWU or SUWU), with a recorded indication that all subsequent paci<ets of a LUWU whose 

IS original packet was rejected are also to be rejected, or by rejecting al! LUWUs whose initial packet is 
missing at the user system.. 

Those super user logins- which are associated with making changes in the login data base are cbecited 
in the same way as conventional logins except that it is recognized in OA&M system 350 as a login request 
for a user who has authority for changing the data base stored on disk 351. 

so Super users types 2 and 3 get access to the OA&M system 350 from a computer connected to a user 
port of MAN. OA&M system 350 derives statistics on billing, usage, auttiorizations and performance which 
ttie super users can access from their computers. 

The MAN network can also serve special types of users such as transmit orriy users and receive only 
users. An example of a trarismit only user is a broadcast, stock quotation system or a video transmitter, 

ss Outputs of transmit only users are only checked in- source checker tables. Receive only ynits such as 
printers or monitoring devices are authorized by entries in the routing tables. 



63 



EP 0 335 562 A2 



11 APPLICATION OF MAN TO V OiCE SWITCHINQ 

riQ, 22 shows an arrangement for using the MAN architecture to switch voice as wefl as data. In order 
to simplify the application of this architecture to such services, an existing switcii in ttiis casa, ths 5ESS® 

s switcli manufactured by AT&T Networit Systems, ts used. Ths advantage of using sn existing switcfi is tiiat 
it avoids the necessity for developing a program to control a local switcfi, a very large devslopment ©ffort. 
By using an exisfing switcti as She interface between the MAN and voice usefs. this effort can be almost 
completely eliminated. Shovifn on FIG. 22 is a conventional customer telephone connected to a switching 
moduSe 1207 of 5ESS switch 1200. This customar telephone could also be a combined integrated services 

10 digital network (ISDM) voice and data customer station wiilch can also be connected to a 5ESS switch. 
Otfier customer stations 1202 are connected througti a sutjscriber loop earner system 1203 which is 
connected to a switching module 1207. The switching modules 1207 are connected to a time multiplex 
switch 1209 which sets up connections between switching moduies. Two of these switching modules are 
shown connected to an interface 1210 comprising Common Channel Signaling 7 {CCS 7) signaling channels 

;s 1211, puise code modulation (PCM) channels 1213, an special signaling channels 1215. These are 
cannected to a packet assembler and disassembler 1217 for interfacing with an MAN IMIM 2, The function 
of the PAD is to interface between the PCM signals which are generated in the switch and the packet., 
signsfs which are switched in the MAN network. The function of the special signaling channel 1215 is to 
inform PAD 1217 of the source and destination associated with each PCM channel. The CCS 7 channels 

so transmit packets which require further processing by PAD 1217 to get them into the form necessary for 
s-witching by the MAN network- To make the system less vulnerable against She failure of equipment or 
transmission facilities, the switch is shown as being connected to two different NIMs of the MAN network. A 
digital i'BX 1219 also interfaces with packet assembler disassambier 1217 directly. In a subsequent 
upgrade of the PAD, it would be possible to Interiiace directly with SLC 1203 or with telephones such as 

25 integrated services digital network (ISDN) telephones that generate a digital voice bit stream directly. 

The Nll^^ls are connected to a MAN Hub 1230, The NIMs are connected to MlNTs 11 of that tiub. The 
MINTS 11 are interconnected by MAN switch 22- 

For this type of configuration, it is desii-able to switch substantial quantities d data as well as voice in 
order to utilize the capabilities of the MAN hub most effectively. Voice packets, h particular, have very short 

30 delay requirsments in order to minimise the total delay encountered in transmitting speech from a source to 
a destination .and in order to ensure that there Is no substantial interpacket gap which would result in the 
loss of a portion of the speech signal. 

The basic design parameters for MAN have been selactecJ to optimize :data switching, and have been 
adapted in a most straightforward manner as shown in FIG. 22, If ,a large amount of voice packet switching 

3S IS required, one or more of Itie following additional steps can be taken: 

1,, A. form of coding such as adaptive differential PCtvl (ADPCM) which offers excellent performance 
at 32 Kbif/second could be used instead of 64 Kbit PCM, Ex.ceilent coding schemes are also available 
which require fewer than 32 KbWsec, for good performance, 

2. Packets need only be sent when a customer is actually speaking. This reduces the number of 
40 packets that must be sent by at least 2:1 . 

3. The she of the buffer for buffering voice samples could be increased, above the storage for 2S6 
voice samples (a two packet buffer) per channel. However, longer voice packets introduce more delay 
which: may or may not be tolerable depending on the characteristics of the rest of the voice network, 

4. Voice traffic might be concentrated in specialist MINTs to reduce itre number of switch setup 
.iS operations for voice packets. Such an arrangement may enlarge the number of customers affected by a 

failure of a NIM or MINT and might require arrangements for providing alternate paths to anoSher NIM 
and/dr MINT, 

5. Alternate hub configurations can be used. 

so The alternate hub configuration of FIQ. 24 is an example of a step 5 solution, A basic problem of 
switching voice packets is that In order to minimize delay in transmitting voice, the voic^ packets must 
represent only a short segment of speech, as low as 20 milliseconds according to some estimates. This 
corresponds to aa many as 50 packets per second for each direction of speech, if a substantial fraction of 
the input tO' a MINT represented such voice packets, the circuit switch setup time might be tod great to 

55 handle such traffic. If only voice traffic were being switched, a packet switch which would not require circuit 
setup operations might be needed for high traffic situaijons. 

One embodiment of sucfi a packet switch 1300 comprises a group of MINTs 1313 interconnected like a 
conventional array of space division switches wherein each IvtlNT 1313 is connected to four others, and 



54 



EP 0 335 562 A2 



enough Stages are added lo reach all output MiNTs 1312 that carry heavy voice traffic, For added 
prdtectiQn against equipment faifure, the MtNTs l3i3 of ihs packet switch 1300 couid be interconnscted 
through MANS 10 in order to route traffic around a defective IvIINT 1313 and to use a spare MINT 1313 
instead. 

Ths output bit stream of N!M 2 is connected to one of the inputs; (XL) of art Input IVIINT 1311. The 
paclcet data traffic leaving input fvUNT 1311 can continue lo be switched through MANS iO. In this 
embodiment, the data packet output of I^^ANS 10 is merged with the voice packet output of data switch 
1300 in an output MINT 1312 which receives ihe outputs of MAN3 10 and data switch 1300 on the XL 16 
(input) side and whose IL 17 quiput is the ir^put bit strearn of NIM .2, produced by a PASC circuit 290 (f=le, 
13). Input MINT 1311 doss not contain tfie PASC circoit 280 (FIS. 13) for gsnerating the output bit stream 
to NIM 2. For output MiNT 1312 the inputs to the XLs from MANS 10 pass through a phase alignment 
circuit 292 (FIG. 13) such as that shown in FIG, ZS: since such Inputs come from many different sources 
through circuit paths that Insert different delay. 

This arrangement can ateo be used for switching high priority data packets through the packet sv/itch 
13O0 while retaining [he circuit switch tO for switching low priority data packets. With this arrangement, it is, 
not necessary to connect the packet switch 1300 to output IvStlMTs 1312 carrying no voice traffic: in that 
case, high priority packets to mHTs carrying no voice traffic would have to be routed through circuit switch 
MANS 10. 

FSQ. 26 shows another alternate configuration: in this configuration, while data packets are switched 
once through the circuit swKch as previously described, voice packets are switched twice through the space 
division switch, in FIG- £6. the MiNTs 11 are broken down into two groups. The first group consisting of 
MINT 11-0 through MINT 11-239 are used in the conventional way and have both voice and data packet 
inputs from the NIMs to which they are connected by a link 3. When one of the MINTs jl-p,...,11-239 
recognizes a voice packet, It prepares to send that voice packet through tfie circuit switch fvlANS 10 to one 
of 16 specialist voice packet switch rnodufes, MINTs 11 -240.. ..,11-255. Each of ihe MINTs 11-0....,11-239 
can then assemble voice packets in only 18 different groups, one group.,for each of the voice packet 

switching modules, MINTs 11-240 11-255, So that any circuit connection from one of the MINTs 11- 

Q can carry voice packets destined for 1/1 6th of the 960 NIMs connected to the 240 voice and 

data packet switch modules, 

A voice packet or a chained series Of voice packets destlhed for one of the voice packet switch 

modules, MSNTs 11-240 11-£SS, ls connected from the output of fcJANS 10 to an input of such a MINT. 

Ttie voice packet switch MINT then separates each incoming packet stream into 15 possible destinations 
and assembles voice packets received from any of the voice and data packet switch modules. MINTs 11- 

0. ....11-239. for each of the 15 destinations (NIMs) served by each of the voice packet switch mocftjles, 
: MINTS 11-240 11-255; Each of the latter MINTs then transmits a chain of packets for each of the 15 NIMs 

served by that MINT through MANS 10 to the one of the outlets of MANS 10 that is connected to the 
con-ect desEinatlon H\M, 

This arrangement sharply reduces the number of connectioils that must be set up through MANS 10 for 
transmitting voice packets since each voice and data packet MINT has on!y -ie voice packet destinations 
f (MINTS 11-240 11-255) and each voice packet switch MINT,. 11-240 11-255, has only 15 destinations, 

1. e., !he 15 NIMs that it serves. This is in contrast lo a comparable single stage arrangement whereby each 
voicB and data packet switch module must set up connections to up to 960 different NIMs. 



RG. 21 illustrates one arrangement for controlling access by MINTs 1 1 to the MAN switch control 22. 
Each MiNT has an associated access controller 1120. A data ring 1102,104,1103 distributes data indicating 
the availability of output links to each logic and count circuit 1110 of each access controller. Each access 

50 controller 1120 maintains a list li lO of output links such as 1112 to which it wants to send data, each link 
having an associated priority indicator Vtl 4. A MINT cm seize an: output iink of that list by marking tbe iink 
unavailable in ring 1102 and transmitting an order to the MAN switch control 22 to set up a path from an 
iLH of thai MINT to the requested output link. When the full data block to be transmitted to that output link 
has been so transmitted, the MINT marks ttw output link available in the data transmitted by data ring 1102 

53 which thereby makes that output link available for access by other MINTs. 

A problem with using only availability data is that during periods of congestion the time before a 
particular MINT may get access to an output link can be excessive . In order to even the accessibility of any 
output link to any MINT, the following arrangement is used. Associated witfi each link availability indication. 



55 



m 




EP 0 335 562 A2 



csiled a ready bit transmitted In ring 1102, is a window bit transmitted, in ring 1104. Tiie ready bit is 
controtied by any MINT tliat 3eizes or releases an output lint(. Ttie window bit is controlted by the access 
wr,lrot!er ItSO of only a, single fi4INT called, for ttis purposes of Ms description, tlie controlling MINT. In 
this particular embodiment, tfie controlling MINT for a given output link is the MiNT to whicfi ttie 

5 corresponding output link is routed. 

Tfie effect of an open window (window fait = 1) is to let tiis first access controHer on the ring that wants 
to seize an output linl< and recognizes its availability as the ready bit passes the controller, seize such a 
lini^, and to Set any controller which tries to sei^e an unavaiiaijie tin.k set the priority indicator 1114 for that 
unavailable !inf<. The effect of a closed window {window bit = 0) is to pemiil wily controllers which have a 

fo priority indicator set for a corresponding available link to seize that available fink. The window is closed by 
tlie access controller 1120 of the contrqiling MiNT whenever the logic and count circuit 1100 of that 
controiler detects that file output link is not available (ready bit = 0) and is opened whenever that contrOler 
detects that that output link is avaitabte (ready bit = 1). 

The operation of an access controller Sei^ng a link is as foHowsi. If the link is unavaifable (ready bit = 

IS 0) and the window bit is one, the access controller sets the priority indicator 1114 for that output link. If the 
link is unavaifeWe and. the v/indow bit is zero, the coritroiler does nothing, if the link Is available, and the 
window bit is one, the controller seises the link and marks the ready bit ma to ensure that no other 
controller seizes the same link. If the ii'nk is available and the window bit is zero, then only a controller 
whose priority indicator 1114 Is set for that link can seize that !ink and will do so by mafking the ready bit 

20 zero. The action of the access controller of the oontroliing WT on the window bit Is simpler: that controller 
sirmply copies the value of the ready bit into the window bit. 

in addition to the ready and window felts, a frsm© bit is circulated in ring 1106 to cSeiins the begmning ai 
a frame of resource availabiiity data, hence, to define the count for identifying the link associated with each 
clear and window bit.. Data on the three rings 1102, 1104 and 1106 circulates serially and in synchronism 

25 through the logic and count circuit 1 100 of each MINT. 

The result of this type of operation Is that those access controilers which are trying to sei2e an output 
link and which are focated between the unit that first successfully seized that output link and the access 
controller that controls the window bit have priority and will be served in turn before any other controllers 
that subsequently may make a request to seize the specific output link. As a result, an approximately Sair 

30 (iistribution of access. by all MINTS to all output links is achieved. 

it this alternative approach to controlling . Mil^ 11 access control to the MANSC 22 is used, priority is ' 
controlled from the MINT. Each MINT maintains a priority and a regular queue for queuing requests, and 
makes requests for MANSC sen/ices first from the MINT priority queue. 



13 CONCIUSION 

it is to be understood that the above tjescrSption is only of one preferred embodiment of the invention. 
Numerous other arrangements may be devised by one skilletf in the art without departing from the spirit 
40 and scope of" the invention. The invention is thus lin^ited only as defined In the accompanying claims. 



sp ISC -f'^irst Stage Controller 
2SC Second Stage Controller 
ACK Acknowledge 
ABP Address Resolution Protocol 
ABQ Automatic Repeat Request 

55 BNAK Busy Negative Acknowledge 
CC Centra! Control 

CNAK Control Negative Ackriowledge 
CNet Control Network 



APPENDIX A 



45 



ACRONYMS AND ABBREVIATIONS 



56 



EP 0 33S 562 A2 



CRC Cyclic Redunciancy Check or Code 
pNet Data Network 

DRAM ^Dynamic Random' Access Memory 
DVMA Direct Vtrtual Memory Access 
s EUS End User System 

EUSL End User Unk (Connects NIM arid UI.M) 
FEP Front End Processor 
FIFO First In First Out 

FMAK Fabric Blocking Negative Ackriowledge 
10 it internal Link {Connects ivtINT and MANS) 

ILH Internal Unk Handler 

IP internet Protocol 

LAN Local Area Network 
■ LUVyU Long User Woric Unit 
ts MAH g5?emplary Metropolitan Area Network 

MANS MAN Switcli: 

MANSC f^/IAN/3witcti Controiler 

MiNT Msmor/ and Interface Module 

MMU Memory Management Unit 
S3 NAK Negative Acknowledge 

NIM Network interface Module 

OA&M Operation, Administration and Maintenance 

PASC Phase Alignment and Scramble Circuit 

SCO Switcti' Control Complex 
25 SUWU Short User Work Unit 

TCP Transmission Controi Protocol 

ISA Time 5!0t Assigner 

UDP User Datagram Protocol 

um User Interface Modulo 
30 UWU User Work Unit 

VLSI Very Large Scale integration 

VME(5 bus An IEEE Standard Bus 

WAN Wide Area Network 

XL External Link (Connects NjM. to MINT) 
35 XLH External Linl< Handler 

XPC Qrosspoint Contrclter 



Claims 

1. A data switctiing network for connecting a plurality of inlets to a plurality of outlets, comprising: 
circuit switch means for switctiably connecting a plurality .of iliputs and said, plurality oi outlets; and 

a piutallty of data distribution means for assembling and cttalning data packets from ones of said plurality of 
inlets for transmission to one of said outlets and for transmitting said chained data packets to one ol said 
45 inputs of said circuit svritch for connection to said one outlet. 

2. The network of claim 1 wherein eacts -of said data distribution means comprises: 
a memory for ■ storing incoming data packets; 

a first plurality of microprocessors connected to ones of said plurality of inlets for controlling tiie storage of 
header information of each of said data packets; and 
so a second plurality of microprocessors for processing said header: Inforrnation and queuing data packets 
destined For a common outlet, 

3. Tfie networi( of claim 2 furttier comprising means operative under the confrol of said second plurality 
of microprocessors for controlling transmission of said queued data packets destined for said common 
outlet to one of said inputs of said circuit switch means, 

55 4. The network of claim 1 wherein said data packets comprise voice packets- 

5. A metropolitan area data switching network for switching data packets, comprising a central hub for 
connecting a plurality of inlets to a plurality of outlets, said hub comprising: 
a circuit switch for switchably connecting a plurality of inputs and said plurality of outlets; 

57 



EP 0 335 562 A2 



a plurality of. data distribution modules for assembling and chaining data streams, said data streams 
comprising data and voice packets, from ones of said plurality of inlets for transmission to one of said 
outlets; and transmitting said chained data streams to one of said inputs of said circuit switch for connection 
to said one outlet; and 

s means for concan^ating data from a plurality of end user systems to a high-speed data link, connected to 

one of said plurality of data distribution rT5odul©s, said means for concentrating comprising means for 

adding port identification data to said transmitted paci<et: 

wherein each of said data distribuSon moduies comprises; 

a memory for storing incoming data packets; 
JO a first plurality of microprocessors connected to said plurality of inlets for controlling storage of header 

information of each of said data pacitsts; and 

a second pluralHy of microprocessors for processing said header information and chaining the data packets 
destined for a common outlet; 

me^ns, operative under the control of said second pisjralit/ of microprocessors, for controlling transmission 
15 of said chained data packets destined for said common outlet to one of said inputs; and 

ccntroi means for verifying that a source, Identified by a source identification, of each data packet is 

authorized to' transmit to a destination of that d^a packet and for verifying that said port identification is 

authorized to transmit with said source identification. 
6;The network of claim 5 further comprising: 
20 a plurality of data concentration/distribution modules each for concentrating data traffic from a plurality of 

end users to an inlet of said hub. and for distributing data traffic from an outlet of said hub to said plurality 

of end users. 

7. A data switch having a plurality of inlets and outlets, comprising^ 
a plurality of data: distribution switch means, eacti for chaining groups of data packets received on ones of 
;5 said plurality of inlets connected to said each data distribution switch means and destined for one of said 
plurality of outlets; and 

circuit switch means connected, to said data disfribution switch means for seeing up a circuit connection 
from one of said data distribution switch means to one of said outlets for each of said groups of chained 
packets. 

30 3. In a data switching system, a method of transmitting data packets' each to one of a plurality of outlets 
comprising, the steps of; 

chaining groups of data packets destined for a common outlet; and 

transmitting a request for a connection to a circuit switch for each chained group of data packets. 

9. The data switching network of ciaim 1 wherein said circuit swltcti means comprises a plurality of 
3S controilers each for controJiirig one of a plurality of disjoint sets of connections in; said circuit switching 

network. 

10. The data switching network of claim 9 wherein said circuit switch means comprises a space division, 
network for switchafely connecting said plurality of inputs and said plurality of outlets. 

It The method of claim 8 wherein said circuit switch comprises a plurality of controllers each for 
40 controlling one of a disjoint set of connections of said circuit switch wherein said transmitting step 
comprises the step of: 

transmitting a request for a connection to one of said controllers of said circuit switch, said one controller 
controlling a disjoint set of connections that includes said requested connection. 

12. The method of dalm 11 wherein said data switching system comprises a plurality of data switching 
J5 modules each connected to at least one inlet and one output and wherein said circuit switch connects each 
of said outputs of said plurality of data switching modules to said pluraiily of outlets further comprising the 
steps of: 

in each of s^d plurality of data distribution modules, storing packets received on. said at least one inlet; 
determining an outlet for which each stored packet is to be transmitted and chaining data packets which are 
so to be transmitted to a common outlet; 

receiving an indication that a requested connection has been established transmitted from one of said 
controllers to one of said data switching modules; and 

transmitting a chained group of data packets from said one of said data switching moduies to said circuit 
s'witch for transmission over said established requested connection. 
ss 13. A data switching system tor switching data packets from a plurality of inlets to a plurality of outlets, 
comprising: 

a plurality of data switching means, each having, at least one output, for chaining data packets from ones of 
said plurality of inlets to one of said piorality of outlets; and 

58 



EP 0 335 562 A2 



circuit switching means connected to said plurality of data switching means for connecting outputs of said 
piurality of data switching means to said pturatity of outlets; 

each of said da^a switcliing means comprising means for requesting, of ssid circuit switching means a 
connection 'between an output of said eacti data switcliing means and one of said piurality of outlets, said 
5 means for requesting compriging high priority and tow priority queues for storing requests to set up a 
connectiofi for transmitting a chain of data paclcets having high priority and low priority respectively, 

14. The data: switching system of claim 13, wherein said circuit switching means comprises at least one 
controiter, said at least one controtter comprising queues for requests from ones of said piurality of data 
switching modules, said queues comprising a queue for high priority requests and a queue for low priority 

10 requests, 

15. The, data switching system of daim 14 wherein said packets comprise data for identifying high 
priority paci<ets and wherein, said high priority requests comprise requests to switch a chain of pacl<ets 
headed by a high priority pacltet. 

1S, .A data switching system comprising; 
?5 a data concentratlonydistrlbution stage for concentrating data pacitets from a plurality of sources to one of a 
plurality of duplex high-speed data links and for distributing data packets from, ana of said plurality ot 
duplex high-speed data linl<s to a plurality of destinations;, and 
a hub for switching data packets among .said plurality of high-speed data links; 

wherein said hub comprises a pluraiity of data switching moduies for switching data packets from ones of 
zo said piurality of high-speed data links to outputs of each of said data switching modules and a circuit switch 
for switching from said .outputs of said data switching modules to ones of said pluraliiy of high-speed data ■ 
links; 

wherein each of said data switching modules comprises means for chaining data packets destined for a 
common high-speed data link and for transmitting connection requests to said circuit switch; 
ss wherei.ri said circuit switch comprises at least one controller comprising queues for requests from ones of 
said plurality of date switching modules, said queues comprising a queue for high priority requests and a 
queu6 for low priority requests; 

wherein said data packets comprise data for identifying high priority packets and wherein said high priority 
requests comprise requests to switch a chain of packets headed by a high priority packet; 
3D wherein each of said data switchirig modules comprises a queue for high priority circuit switch setup 
requests and a queue for low priority circuit switch setup requ.e.sSS arid comprises means for transmitting to 
said at least one controller of said circuit switch requests from said queue for high priority feque5ts before 
transmitting requests from said queue for low priority requests, 

17. In a data switching system, a method of transmitting data packets each to one of a piurality of 
3S outlets, comprising the steps of: 

chaining groups of data packets destined for a common outlet; 

determining for each chained group of data packets whether said group is high priority or iow,priority; 
transmitting a high priority request for a connection to a circuit switch for each chained-group of data 
packets having high priority; and 
40 transmitting a low priority request for a connection to said circuit switch for each chained group of data 
packets having low priority. 

18. Th^.data switching system of claim 13 wherein said packets comprise data For identifying high 
priority packets and wherein said high priority requests comprise requests to switch a chain of packets 
including at least one high priority packet. 

15 19. the data switching system of claim 13 wherein each of said data packets is limited in isngth to a 
predetermined numSier of bits. 

20. The data switching system of claim t9 wherein said high priority requests further comprise requests 
to switch a chain of packets including at (east one high priority packet. 

21, The data switching system of claim 16 wherein ■ said data packets are limited in size to a 
so predetermined number of bits, 

za. The method of claim 17 wherein said packets comprise data for Identifying high priority packets and 
wherein said determining step comprises the step of determining for each data packet of a chained group 
of data packets whether said data packet is high priority and classifying said chained group of data packets 
as high priority if any of said data packets of said chained group is classified as high priority. 
55 23. The method of claim 17 wherein said packets comprise data for Identifying high priority packets and 
wherein said determining step comprises the step of determining for a first data packet of a chained group 
of data packets whether said data packet is high priority and classifying said chained group of data packets 
as high priority if a fii'st of said data packets of said chained: group is classified as high priority. 

5S 



EP 0 335 562 A2 



24. The method of claim 17 furiher comprising the steps of: 

foiiQwing said deferniinlng step, storing a high priority request for each group determined to bB high priority 
id a iiigii priority request queus; and 

ior each divined group determined to be Sow priority storing a low priority request in a iow priorily request 
5 queuei. 

25. The method of claim 17 further comprising the step of: 

attempting to establisti connections in said circuit switch in response to sasd high priority requests before 
attempting to establish connections in response to said low priority requests. 

26. A system tor switching voice signals comprising: 

io megns for converting said voice signeis into voice pacltels; and means, connected to saW means for 
converting, for packet switching said voice packets, comprising: 
a plurality of input pacHet handlers and a piuraiity of output packet handlers; 

memory access means for controlling storing and reading of said voice pacl<ets, comprising a plurality of 
rn;$mory access contrplters for storing consecutive words of a voica packet in consecutive members of a 
TS plurality of memory modules; and 

means for distribuHng said yo'tce pacltets from said plurality of input packet handlers to said piuraiity of 
itiemory access controHers and for assembiing said voice packets from said plurality of memory access 
oontroliers to s'aid plurality of output packet handlers. 

27. The system of claim 26, comprising a piuraiity of said means for converting and a plurality of said 
20 means for packet switching further comprising circuit switch means for switching said voice packets 

between output packet handlers of a piuraiity of said means for packet: switching and ones of a plurality of 
communication paths, and wherein said means for packet switching said voice packets comprise means for 
chaining voics packets in groups, each group for connection over one of said communication paths. 

2S. The system of claim 27 wherein ones of said plurality of communicalion paths are connectable to a 
as packet to digital voice .signal converter. 

23, The system of claim 23 wherein said means for converting said voice signals into voice packets is 
comprised in a digital switching system corinectalile to customer stations; 

said ■digital switching systems further comprising means for generating signaling information to said means 
for converting ior signaling terminal identification data for switching packets of a voice connection to a 
30 customer station, and: for generating signaling information to said means for converting for signaling the 
identity of a requested customer station to a switch sen.'inQ that requested customer station, 

30. A networit for switching first packets comprising data and second packets comprising voice signals, 
comprising; 

first data switching means for switching said first and said second packets to first and second outputs 
35 respegtivety; 

circuit switching means connected to said' first outputs for further switching said first packets; and 
second data switching means connected lo said second outputs for further switching said second packets. 

31. A system, for switching data and voics signals comprising: 

digital switching means connectable to customer lines for generating digital speech signals: 

40 means for generating speech channel identification information; 

means connected to said digital switching means for converting speech signals Into voice packets and 
responsive to ■said speech channel identification information for generating headers to said voics packets; 
means for concentrating data traffic from and distiibuting baffle to said means for generating voice packets: 
means, connected via data links to said means for concentralng, for packet switching said voice packets 

■45 comprising; 

a piuraiity of Input packet handlers and a plurality of output packet handlers; 

memory means for storirig said voice packets comprising a plurality of memory modules for storing 
consecutivo .words of a voice packet: 

means for chaining packets, into groups destined for a common means for distributing and for communicat- 

50 ing said chaining data to said output packet handlers; 

means, controlled by said input packet handlers for distributing said voice packets from said plurality of 
input packet handlers to said plurality of memory modules and, controlled by said output packet handlers, 
for assembling said chained groups of voice packets from said plurality of r?iernory modules to said plurality 
of output packet handlers. 

5S 32. The system of claim 31 further comprising: 

circuit switching means connected to said means for packet switching for groups of packets from said 
means for packet switching to ones of data links connected to said means for concentrating data. 



60 



EP 0 335 562 A2 



33. A method: of switching voice and data packets comprising the steps of; 

packet switching $aid voice packets received on Inputs of a first packet switch means to first outputs of said 
first packet switch means and said data paci<ets to second outputs of said first packet switch means; 
connectins said first outputs to a circuit switch means and said second outputs to a second packet switch 

34. A, method of switching voice signals comprising the steps of: 
converting said voice signals to voice packets; 

transmitting said voice packets to an input, packet handler of a data switching means; 
iransmltting data from said input packet handle? to a plurality of mennary access controllers of said data 
TO switching means for controlling storage of voice packets In a plurality^ of memory modules; 
chaining packets into groups having a common intermediate destination; and 

Iransmitting each of said groups from said plurality of memory access controllers lo an output data handler 
of said data switching means for further transmission to on© of said intermediate destinations. 

35. A network. for switching first packets, comprising data, and second packets, comprising information 
w representing voice signals, from a plurality of inlets to a pSurality of outlets, comprising; 

first and second data switching means; and 

circuit switching means; . . • , * , 

said first data switching means for switching said first and said second packets received from said iniets to 
said circuit switching means for further switching to said outlets and to said second data switching means, 
20 respectively; 

said circuit switching means responsive to said packets receiveti from said first data switching means for 
switching said first, and second packets to said outlets and said second data switching means respectively; 
said second data switching means responsive to said second packets received from said circuit switching 
means for switching said: second packets to said circuit switching means for furtfier switching to said 
outlets; 

said circuit switching means further responsive to said second packets received from sasd second data 
switching means for switching said packets to said outlets. 

36. The network of claim 35 wherein each of said first and second data switching means comprise 
m.eans for generating control signals for selecting outlets and second data switching means and wherein 

30 5^d circuit svAchIng means Is responsive to said control signals for switching a packet received from one 
of said data switching means to an outlet or a second data switching means .selected by a control signal 
from said one of said data switching means. 

37. The network of claim: 36 wfierein each of said data switching means- comprise a plurality of data 
switching modules, and wherein each of said data switching modules of said first data switching means 

35 comprises means for chaining received first data packets destined for a common outlet and for chaining 
received second data packets destined tor a common one of said plurality of data switching moduies of 
said second data switching means, and means for generating control signals for controliing the switching by 
said circuit switching means of said chained received packets to said common outlet or said one of said 
plurality of switchirig modules of said second data switching means,. 

^ 38. The network of claim 37 wherein each of said data switching moduies of said second data switching 
means comprises means for chaining: received second data packets destined for another common outlet 
and means for generating control signals for switching said chained received packets to said other common 
outtet, . 

39. !n a data switching system comprising circuit switching means and first and second data sw)Sching 
45 means, a method for switching first packets comprising data and second packets, comprising information 

representing voice signals from a plurality .of inlets to said first data switching means to a plurality of outlets 
comprising the steps of;. 

data switching said first packets, from said iniets to said first data switching means, to said circuit switching 
means for further switching to said ouUets; 
60 data switching said second packets, from said iniets to said first data switcfiing means, to said circuit 
switching means for further switching to said second data switching means; 

data switching said second packets in said second data switching means to said circuit switching means for 
further switching to said outlets. 

40. Ths mettiod of claim 39 further comprising the steps, of generating control signals in said first data 
S5 switching means for causing said circuit switching means to switch ones of said paci^ets to outlets or said 

second data switching means. 



SI 



41. The method of claim 40 wherein said second^ data switchirjB means comprises at least one modiile. 
further comprising the steps of chaining Hrst packets desttneci for a common outlet and chairjing second 
packets dsslinai fo? « modoSe of said second data switching meana 



62 



EP 0 335 562 A2 




0 ® 

BP 0 335 562 A2 




SP 0 335 562 A2 




EP 0 335 562 A2 





FIG, 6 



•J 



BP 0 335 563 A2 




FIG, 7 



EP 0 33S 562 A2 




I 



EP 0 335- 552 A2 




BP 0 335 562 A2 





UP 0 335 562 A2 




EP 0 335 562 A2 




EP 0 335' 562 A2 




O 

Ll 



BP 0 335 S62A2 




EP 0 335 562 Aa 



^^^''^l^'^ I SOURCE/OEST PORT(S) ksOD 
HEADER I - — ™— — — — ' 



tlAN 
HEADER 



"destination J^^12 



BJS TO EUS^ 
HEADER 



INTERNAL 

EUS 
AND DATA 




636- ^ PACKET ] | P.ROTQCOlJ-636 
[ INITIAL BYTE NO, [-639 



dstport I 


SRCPORT 


DATA 


DATA 


,. DATA 


DATA 


DATA 


DATA 


DATA 


: DATA 



DATA CHECK SEQ 



FIG, 20 



EP 0 335 662 A2 



0 




EPO 335 562A2 





EP 0 335 562 A2 




EP 0 335 562 A2 



..TO/FROni 

Nin 2 



VOICE a DATA 
PACKET SWITCH 
nODULE 0 



12 



12 



TO/FROM 
Nlh 2 



VOICE a DATA 
PACKET SWITCH 
nODOLE 233 



12 



VOICE 
PACKET SWITCH 
tlODULE 0 



12 



r 1 1 --255 

VOICE 
PACKET SWITCH 
MODULE !5 



12 



FIG. 26 



